Files
Chocolatine-2-4B-Instruct-D…/README.md
ModelHub XC 10d4837fa5 初始化项目,由ModelHub XC社区提供模型
Model: jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1
Source: Original Platform
2026-04-10 18:45:01 +08:00

173 lines
7.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
language:
- fr
- en
library_name: transformers
tags:
- dpo
- post-training
- french
- alignment
- model-merging
- qwen3
- chocolatine
- comparia
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- jpacifico/comparia-dpo-pairs-bt-6k
- jpacifico/french-orca-dpo-pairs-revised
---
# Chocolatine-2-4B-Instruct-DPO-v2.1
**Chocolatine-2-4B-Instruct-DPO-v2.1** is a post-trained version of [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507), designed to improve instruction-following, reasoning, and overall performance in French, while preserving strong multilingual capabilities.
In my evaluation setup, it delivers consistent gains across the tested French benchmarks, pointing to a broad improvement in French capabilities.
Although the post-training pipeline focuses on French preference data, no degradation is observed on English tasks, and slight improvements are sometimes seen, suggesting positive cross-lingual transfer.
Optimized variants (MLX, GGUF) are also available, making the model particularly suitable for local inference.
## Model Overview
- **Base model:** Qwen/Qwen3-4B-Instruct-2507
- **Parameters:** 4.0B
- **Context Length:** 262,144 natively
- **Post training methods:** DPO + Model Merging
Note: This model supports only non-thinking mode and does not generate `<think></think>` blocks in its outputs.
This design is consistent with the goals of the post-training setup, which favors a compact dense instruct model focused on direct generation efficiency and practical downstream use.
For use cases requiring explicit reasoning traces or structured thinking outputs, Qwen/Qwen3.5-4B (thinking mode) is recommended.
**Model Variants**
- Chocolatine-2-4B-Instruct-DPO-v2.1 (this repo): Contains the retrainable weights in BF16 format
- Quantized GGUF versions : [Q4_K_M](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q4_K_M-GGUF) / [Q8_0](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-Q8_0-GGUF) and more from mradermacher [here](https://huggingface.co/mradermacher/Chocolatine-2-4B-Instruct-DPO-v2.1-GGUF)
- MLX (optimized for Apple silicon): [4Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-4Bit) / [6Bit](https://huggingface.co/jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1-mlx-8Bit)
**Ollama** : In addition to the Hugging Face release, quantized 4-bit and 8-bit variants are also available [here](https://ollama.com/jpacifico/chocolatine-2.1) on Ollama for convenient local inference.
## Benchmarks
The results indicate a consistent improvement across the tested French benchmarks, covering several capability types. This suggests a broad gain in French performance, while English results remain overall stable.
| Benchmark fr | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|---|---:|---:|
| gpqa-fr:diamond | 28.93 | **32.49** |
| french_bench_arc_challenge | 47.13 | **49.79** |
| french_bench_grammar | 70.59 | **72.27** |
| french_bench_boolqa | 88.76 | **89.89** |
| french_bench_hellaswag | 56.99 | **58.03** |
| global_mmlu_fr | 63.75 | **64.75** |
| xwinograd_fr | 66.27 | **67.47** |
| fr_mt_bench | 6.22 | **6.44** |
*FR-MT-Bench* evaluation is performed on [MT-Bench-French](https://huggingface.co/datasets/bofenghuang/mt-bench-french), using [multilingual-mt-bench](https://github.com/jpacifico/multilingual_mt_bench) with OpenAI/GPT-5 as the LLM judge.
*global_mmlu_fr*, *xwinograd_fr* and *french_bench* results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.
*gpqa-fr:diamond* using LightEval/vLLM via [kurakurai/Luth](https://github.com/kurakurai/Luth.git) process eval.
| Benchmark eng | Qwen3-4B-Instruct-2507 (base) | Chocolatine-2-4B-Instruct-DPO-v2.1 |
|---|---:|---:|
| arc_challenge | **58.79** | 58.45 |
| hellaswag | 69.08 | **70.16** |
| boolq | 84.80 | **85.32** |
| gpqa_diamond_zeroshot | **38.89** | 38.38 |
English benchmark results were obtained using [EleutherAI LM Eval Harness](https://github.com/EleutherAI/lm-evaluation-harness) in a **0-shot** evaluation setting.
## Training & Alignment Pipeline
Chocolatine-2-4B-Instruct-DPO-v2.1 is derived from [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) using a multi-step post-training pipeline:
**Stage 1 DPO (Compar:IA adaptation)**
Direct Preference Optimization (DPO) on a DPO-adapted version of **[Compar:IA](https://comparia.beta.gouv.fr/datasets)** data, derived from the preference dataset [comparia-votes](https://huggingface.co/datasets/ministere-culture/comparia-votes), part of a public initiative led by the Ministry of Culture (French gov). Previous iterations of the Chocolatine model series also were selected as part of this initiative.
I constructed an original DPO dataset from these votes by transforming them into preference pairs (chosen / rejected), with additional filtering and formatting steps to make them suitable for DPO fine-tuning.
Two dataset variants were created ([6k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-6k) and [13k](https://huggingface.co/datasets/jpacifico/comparia-dpo-pairs-bt-13k) preference pairs).
The **6k variant** was used for the DPO training reported in this release.
**Stage 2 DPO (French-ORCA pairs)**
A second DPO stage using a french-version of ORCA preference pairs, based on the dataset **[jpacifico/french-orca-dpo-pairs-revised](https://huggingface.co/datasets/jpacifico/french-orca-dpo-pairs-revised)**, commonly used in the Chocolatine training pipeline.
This stage further improves : general instruction alignment, robustness across tasks, cross-lingual capabilities.
**Stage 3 Model Merging (MergeKit + TIES)**
The resulting checkpoints were merged using **MergeKit** with the TIES method.
TIES merging: selects task-relevant parameter updates, reduces destructive interference between models and preserves base model stability.
MergeKit configuration:
```yaml
# ties2 recipe
models:
- model: jpacifico/Qwen3-4B-Instruct-DPO-test2
parameters:
density: 0.5
weight: 0.5
- model: jpacifico/Qwen3-4B-Instruct-DPO-test-b3
parameters:
density: 0.5
weight: 0.5
merge_method: ties
base_model: Qwen/Qwen3-4B-Instruct-2507
parameters:
normalize: false
int8_mask: true
dtype: bfloat16
```
## Usage
The following contains a code snippet illustrating how to use the model generate content based on given inputs.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "jpacifico/Chocolatine-2-4B-Instruct-DPO-v2.1"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=16384
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
```
## Limitations
The Chocolatine-2 model series is a quick demonstration that a base model can be easily fine-tuned to achieve compelling performance.
It does not have any moderation mechanism.
Developed by: Jonathan Pacifico, 2026
Model type: LLM
Language(s) (NLP): French, English
License: Apache-2.0
Made with ❤️ in France