90 lines
4.5 KiB
Markdown
90 lines
4.5 KiB
Markdown
---
|
|
library_name: transformers
|
|
license: apache-2.0
|
|
datasets:
|
|
- kurakurai/luth-sft
|
|
language:
|
|
- fr
|
|
- en
|
|
base_model:
|
|
- Qwen/Qwen3-1.7B
|
|
pipeline_tag: text-generation
|
|
---
|
|
|
|

|
|
|
|
|
|
# Luth-1.7B-Instruct
|
|
|
|
**Luth-1.7B-Instruct** is a French fine-tuned version of [Qwen3-1.7B](https://huggingface.co/Qwen/Qwen3-1.7B), trained on the [Luth-SFT](https://huggingface.co/datasets/kurakurai/luth-sft) dataset. The model has drastically improved its French capabilities in instruction following, math, and general knowledge. Additionally, its English capabilities have remained stable and have even increased in some areas.
|
|
|
|
Our Evaluation, training and data scripts are available on [GitHub](https://github.com/kurakurai/Luth), along with the [Blog](https://huggingface.co/blog/MaxLSB/luth) we wrote.
|
|
|
|

|
|
|
|
## Model Details
|
|
|
|
Luth was trained using full fine-tuning on the Luth-SFT dataset with [Axolotl](https://github.com/axolotl-ai-cloud/axolotl). The resulting model was then merged with the base Qwen3-1.7B model. This process successfully retained the model's English capabilities while improving its performance on most selected benchmarks in both French and English.
|
|
|
|
## Benchmark Results
|
|
|
|
We used LightEval for evaluation, with custom tasks for the French benchmarks. The models were evaluated with a `temperature=0`.
|
|
|
|
### French Benchmark Scores
|
|
|
|
| Model | IFEval<br>French | GPQA-Diamond<br>French | MMLU<br>French | Math500<br>French | Arc-Challenge<br>French | Hellaswag<br>French |
|
|
|------------------------|-----------------|-----------------------|----------------|-----------------|------------------------|-------------------|
|
|
| **Luth-1.7B-Instruct** | <u>58.53</u> | <u>36.55</u> | <u>49.75</u> | <u>62.60</u> | 35.16 | 31.88 |
|
|
| Qwen3-1.7B | 54.71 | 31.98 | 28.49 | 60.40 | 33.28 | 24.86 |
|
|
| SmolLM2-1.7B-Instruct | 30.93 | 20.30 | 33.73 | 10.20 | 28.57 | <u>49.58</u> |
|
|
| Qwen2.5-1.5B-Instruct | 31.30 | 27.41 | 46.25 | 33.20 | 32.68 | 34.33 |
|
|
| LFM2-1.2B | 54.41 | 22.84 | 47.59 | 36.80 | <u>39.44</u> | 33.05 |
|
|
|
|
### English Benchmark Scores
|
|
|
|
| Model | IFEval<br>English | GPQA-Diamond<br>English | MMLU<br>English | Math500<br>English | Arc-Challenge<br>English | Hellaswag<br>English |
|
|
|------------------------|-----------------|------------------------|----------------|------------------|-------------------------|--------------------|
|
|
| **Luth-1.7B-Instruct** | 65.80 | 29.80 | <u>60.28</u> | 70.40 | 42.24 | 58.53 |
|
|
| Qwen3-1.7B | <u>68.88</u> | <u>31.82</u> | 52.82 | <u>71.20</u> | 36.18 | 46.98 |
|
|
| SmolLM2-1.7B-Instruct | 49.04 | 25.08 | 50.27 | 22.67 | 42.32 | <u>66.94</u> |
|
|
| Qwen2.5-1.5B-Instruct | 39.99 | 25.76 | 59.81 | 57.20 | 41.04 | 64.48 |
|
|
| LFM2-1.2B | 68.52 | 24.24 | 55.22 | 45.80 | <u>42.58</u> | 57.61 |
|
|
|
|
## Code Example
|
|
|
|
```python
|
|
from transformers import AutoTokenizer, AutoModelForCausalLM
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("kurakurai/Luth-1.7B-Instruct")
|
|
model = AutoModelForCausalLM.from_pretrained("kurakurai/Luth-1.7B-Instruct")
|
|
messages = [
|
|
{"role": "user", "content": "Quelle est la capitale de la France?"},
|
|
]
|
|
inputs = tokenizer.apply_chat_template(
|
|
messages,
|
|
add_generation_prompt=True,
|
|
tokenize=True,
|
|
return_dict=True,
|
|
return_tensors="pt",
|
|
).to(model.device)
|
|
|
|
outputs = model.generate(**inputs, max_new_tokens=100)
|
|
print(
|
|
tokenizer.decode(
|
|
outputs[0][inputs["input_ids"].shape[-1] :], skip_special_tokens=True
|
|
)
|
|
)
|
|
```
|
|
|
|
## Citation
|
|
|
|
```bibtex
|
|
@misc{luth2025kurakurai,
|
|
title = {Luth: Efficient French Specialization for Small Language Models and Cross-Lingual Transfer},
|
|
author = {Lasbordes, Maxence and Gad, Sinoué},
|
|
year = {2025},
|
|
howpublished = {\url{https://arxiv.org/abs/2510.05846}},
|
|
note = {arXiv:2510.05846}
|
|
}
|
|
```
|