173 lines
7.7 KiB
Markdown
173 lines
7.7 KiB
Markdown
---
|
||
language:
|
||
- fr
|
||
- en
|
||
library_name: transformers
|
||
tags:
|
||
- dpo
|
||
- post-training
|
||
- french
|
||
- alignment
|
||
- model-merging
|
||
- qwen3
|
||
- chocolatine
|
||
- comparia
|
||
license: apache-2.0
|
||
base_model: Qwen/Qwen3-4B-Instruct-2507
|
||
datasets:
|
||
- jpacifico/comparia-dpo-pairs-bt-6k
|
||
- jpacifico/french-orca-dpo-pairs-revised
|
||
---
|
||
|
||
# Chocolatine-2-4B-Instruct-DPO-v2.1
|
||
|
||
**Chocolatine-2-4B-Instruct-DPO-v2.1** is a post-trained version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), designed to improve instruction-following, reasoning, and overall performance in French, while preserving strong multilingual capabilities.
|
||
In my evaluation setup, it delivers consistent gains across the tested French benchmarks, pointing to a broad improvement in French capabilities.
|
||
Although the post-training pipeline focuses on French preference data, no degradation is observed on English tasks, and slight improvements are sometimes seen, suggesting positive cross-lingual transfer.
|
||
Optimized variants (MLX, GGUF) are also available, making the model particularly suitable for local inference.
|
||
|
||
|
||
## Model Overview
|
||
|
||
- **Base model:** Qwen/Qwen3-4B-Instruct-2507
|
||
- **Parameters:** 4.0B
|
||
- **Context Length:** 262,144 natively
|
||
- **Post training methods:** DPO + Model Merging
|
||
|
||
Note: This model supports only non-thinking mode and does not generate `<think></think>` blocks in its outputs.
|
||
This design is consistent with the goals of the post-training setup, which favors a compact dense instruct model focused on direct generation efficiency and practical downstream use.
|
||
For use cases requiring explicit reasoning traces or structured thinking outputs, Qwen/Qwen3.5-4B (thinking mode) is recommended.
|
||
|
||
**Model Variants**
|
||
|
||
- Chocolatine-2-4B-Instruct-DPO-v2.1 (this repo): Contains the retrainable weights in BF16 format
|
||
- Quantized GGUF versions : [Q4_K_M](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q4_K_M-GGUF) / [Q8_0](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q8_0-GGUF) and more from mradermacher [here](https://huggingface.co/mradermacher/Chocolatine-2-4B-Instruct-DPO-v2.1-GGUF)
|
||
- MLX (optimized for Apple silicon): [4Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-4Bit) / [6Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-8Bit)
|
||
|
||
**Ollama** : In addition to the Hugging Face release, quantized 4-bit and 8-bit variants are also available [here](https://ollama.com/jpacifico/chocolatine-2.1) on Ollama for convenient local inference.
|
||
|
||
## Benchmarks
|
||
|
||
The results indicate a consistent improvement across the tested French benchmarks, covering several capability types. This suggests a broad gain in French performance, while English results remain overall stable.
|
||
|
||
| Benchmark fr | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|
||
|---|---:|---:|
|
||
| gpqa-fr:diamond | 28.93 | **32.49** |
|
||
| french_bench_arc_challenge | 47.13 | **49.79** |
|
||
| french_bench_grammar | 70.59 | **72.27** |
|
||
| french_bench_boolqa | 88.76 | **89.89** |
|
||
| french_bench_hellaswag | 56.99 | **58.03** |
|
||
| global_mmlu_fr | 63.75 | **64.75** |
|
||
| xwinograd_fr | 66.27 | **67.47** |
|
||
| fr_mt_bench | 6.22 | **6.44** |
|
||
|
||
*FR-MT-Bench* evaluation is performed on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), using [multilingual-mt-bench](https://github.com/jpacifico/multilingual_mt_bench) with OpenAI/GPT-5 as the LLM judge.
|
||
*global_mmlu_fr*, *xwinograd_fr* and *french_bench* results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.
|
||
*gpqa-fr:diamond* using LightEval/vLLM via [kurakurai/Luth](https://github.com/kurakurai/Luth.git) process eval.
|
||
|
||
| Benchmark eng | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|
||
|---|---:|---:|
|
||
| arc_challenge | **58.79** | 58.45 |
|
||
| hellaswag | 69.08 | **70.16** |
|
||
| boolq | 84.80 | **85.32** |
|
||
| gpqa_diamond_zeroshot | **38.89** | 38.38 |
|
||
|
||
English benchmark results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.
|
||
|
||
## Training & Alignment Pipeline
|
||
|
||
Chocolatine-2-4B-Instruct-DPO-v2.1 is derived from [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) using a multi-step post-training pipeline:
|
||
|
||
**Stage 1 – DPO (Compar:IA adaptation)**
|
||
|
||
Direct Preference Optimization (DPO) on a DPO-adapted version of **[Compar:IA](https://comparia.beta.gouv.fr/datasets)** data, derived from the preference dataset [comparia-votes](https://huggingface.co/datasets/ministere-culture/comparia-votes), part of a public initiative led by the Ministry of Culture (French gov). Previous iterations of the Chocolatine model series also were selected as part of this initiative.
|
||
I constructed an original DPO dataset from these votes by transforming them into preference pairs (chosen / rejected), with additional filtering and formatting steps to make them suitable for DPO fine-tuning.
|
||
Two dataset variants were created ([6k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-6k) and [13k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-13k) preference pairs).
|
||
The **6k variant** was used for the DPO training reported in this release.
|
||
|
||
**Stage 2 – DPO (French-ORCA pairs)**
|
||
|
||
A second DPO stage using a french-version of ORCA preference pairs, based on the dataset **[jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised)**, commonly used in the Chocolatine training pipeline.
|
||
This stage further improves : general instruction alignment, robustness across tasks, cross-lingual capabilities.
|
||
|
||
**Stage 3 – Model Merging (MergeKit + TIES)**
|
||
|
||
The resulting checkpoints were merged using **MergeKit** with the TIES method.
|
||
|
||
TIES merging: selects task-relevant parameter updates, reduces destructive interference between models and preserves base model stability.
|
||
|
||
MergeKit configuration:
|
||
|
||
```yaml
|
||
# ties2 recipe
|
||
models:
|
||
- model: jpacifico/Qwen3-4B-Instruct-DPO-test2
|
||
parameters:
|
||
density: 0.5
|
||
weight: 0.5
|
||
- model: jpacifico/Qwen3-4B-Instruct-DPO-test-b3
|
||
parameters:
|
||
density: 0.5
|
||
weight: 0.5
|
||
|
||
merge_method: ties
|
||
base_model: Qwen/Qwen3-4B-Instruct-2507
|
||
|
||
parameters:
|
||
normalize: false
|
||
int8_mask: true
|
||
|
||
dtype: bfloat16
|
||
```
|
||
|
||
## Usage
|
||
|
||
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
|
||
```python
|
||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||
|
||
model_name = "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1"
|
||
|
||
# load the tokenizer and the model
|
||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||
model = AutoModelForCausalLM.from_pretrained(
|
||
model_name,
|
||
torch_dtype="auto",
|
||
device_map="auto"
|
||
)
|
||
|
||
# prepare the model input
|
||
prompt = "Give me a short introduction to large language model."
|
||
messages = [
|
||
{"role": "user", "content": prompt}
|
||
]
|
||
text = tokenizer.apply_chat_template(
|
||
messages,
|
||
tokenize=False,
|
||
add_generation_prompt=True,
|
||
)
|
||
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
|
||
|
||
# conduct text completion
|
||
generated_ids = model.generate(
|
||
**model_inputs,
|
||
max_new_tokens=16384
|
||
)
|
||
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
||
|
||
content = tokenizer.decode(output_ids, skip_special_tokens=True)
|
||
|
||
print("content:", content)
|
||
```
|
||
|
||
## Limitations
|
||
|
||
The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
|
||
It does not have any moderation mechanism.
|
||
|
||
Developed by: Jonathan Pacifico, 2026
|
||
Model type: LLM
|
||
Language(s) (NLP): French, English
|
||
License: Apache-2.0
|
||
|
||
Made with ❤️ in France |