96 lines
4.1 KiB
Markdown
96 lines
4.1 KiB
Markdown
|
|
---
|
|||
|
|
library_name: transformers
|
|||
|
|
base_model: Qwen/Qwen3-8B-Base
|
|||
|
|
tags:
|
|||
|
|
- multilingual
|
|||
|
|
- reasoning
|
|||
|
|
- LLM
|
|||
|
|
- qwen3
|
|||
|
|
license: apache-2.0
|
|||
|
|
datasets:
|
|||
|
|
- lightonai/Dolci-Think-SFT-32B-Multilingual
|
|||
|
|
language:
|
|||
|
|
- fr
|
|||
|
|
pipeline_tag: text-generation
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
# Qwen3-8B-FR
|
|||
|
|
|
|||
|
|
`Qwen3-8B-FR` is a **native reasoning model** fine-tuned from [`Qwen/Qwen3-8B-Base`](https://huggingface.co/Qwen/Qwen3-8B-Base) to reason in French. This model produces its **entire reasoning trace in French** before delivering the final answer in French.
|
|||
|
|
|
|||
|
|
It is released alongside the paper [**Rethinking the Multilingual Reasoning Gap with Layer Swap**](https://arxiv.org/abs/2605.26735).
|
|||
|
|
|
|||
|
|
## Model details
|
|||
|
|
|
|||
|
|
- **Base model:** `Qwen/Qwen3-8B-Base`
|
|||
|
|
- **Language:** French (CoT and answer)
|
|||
|
|
- **Training:** Full SFT, ~10B tokens, 2 epochs
|
|||
|
|
- **Context length:** 32,768 tokens
|
|||
|
|
- **Dataset:** [`lightonai/Dolci-Think-SFT-32B-Multilingual`](https://huggingface.co/datasets/lightonai/Dolci-Think-SFT-32B-Multilingual) (French split).
|
|||
|
|
|
|||
|
|
> [!NOTE]
|
|||
|
|
> The model was trained on data derived from `allenai/Dolci-Think-SFT-32B`, released under the ODC-BY-1.0 license.
|
|||
|
|
|
|||
|
|
## Related models
|
|||
|
|
|
|||
|
|
This model is part of a French specialist trio designed to study the native reasoning gap:
|
|||
|
|
|
|||
|
|
| Model | CoT language | Description |
|
|||
|
|
|---|---|---|
|
|||
|
|
| [`lightonai/Qwen3-8B-FR`](https://huggingface.co/lightonai/Qwen3-8B-FR) | French | Native reasoning specialist |
|
|||
|
|
| [`lightonai/Qwen3-8B-FR-Swap`](https://huggingface.co/lightonai/Qwen3-8B-FR-Swap) | French | Layer Swap: middle layers (L13–L22) of `Qwen3-8B-EN` transplanted into `Qwen3-8B-FR` |
|
|||
|
|
| [`lightonai/Qwen3-8B-FR-Pivot-EN`](https://huggingface.co/lightonai/Qwen3-8B-FR-Pivot-EN) | English | Same French Q&A pairs, but CoT in English |
|
|||
|
|
| [`lightonai/Qwen3-8B-EN`](https://huggingface.co/lightonai/Qwen3-8B-EN) | English | English specialist |
|
|||
|
|
|
|||
|
|
## Evaluation
|
|||
|
|
|
|||
|
|
All scores are mean accuracy (%) on the **French** version of each benchmark, with sample standard deviation across runs. AIME 24/25 is averaged over 30 runs; the others over 10 runs, using the recommended generation parameters.
|
|||
|
|
|
|||
|
|
| Model | MGSM-Rev2 | Global-MMLU-Lite | GPQA-Diamond | AIME 24/25 | HumanEvalPlus | Average |
|
|||
|
|
|---|:---:|:---:|:---:|:---:|:---:|:---:|
|
|||
|
|
| `Qwen3-8B-FR` | 92.80 | 76.45 | 53.59 | 55.67 | 83.31 | 72.36 |
|
|||
|
|
| `Qwen3-8B-FR-Swap` | <u>97.40</u> | 76.57 | 54.55 | 59.11 | <u>86.06</u> | 74.74 |
|
|||
|
|
| `Qwen3-8B-FR-Pivot-EN` | 94.52 | <u>78.37</u> | <u>54.65</u> | <u>62.78</u> | 84.88 | <u>75.04</u> |
|
|||
|
|
| `Qwen3-8B-EN` | 95.72 | 77.50 | 52.53 | 61.39 | 84.19 | 74.27 |
|
|||
|
|
|
|||
|
|
**Benchmarks used:**
|
|||
|
|
|
|||
|
|
- [`lightonai/gpqa_diamond_multilingual`](https://huggingface.co/datasets/lightonai/gpqa_diamond_multilingual)
|
|||
|
|
- [`lightonai/aime24_multilingual`](https://huggingface.co/datasets/lightonai/aime24_multilingual)
|
|||
|
|
- [`lightonai/aime25_multilingual`](https://huggingface.co/datasets/lightonai/aime25_multilingual)
|
|||
|
|
- [`lightonai/HumanEvalPlus_multilingual`](https://huggingface.co/datasets/lightonai/HumanEvalPlus_multilingual)
|
|||
|
|
- [`lightonai/mgsm-rev2`](https://huggingface.co/datasets/lightonai/mgsm-rev2)
|
|||
|
|
- [`CohereLabs/Global-MMLU-Lite`](https://huggingface.co/datasets/CohereLabs/Global-MMLU-Lite)
|
|||
|
|
|
|||
|
|
## Usage
|
|||
|
|
|
|||
|
|
```python
|
|||
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|||
|
|
|
|||
|
|
model_name = "lightonai/Qwen3-8B-FR"
|
|||
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
|||
|
|
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
|
|||
|
|
|
|||
|
|
messages = [{"role": "user", "content": "Résous : 24 × 17 = ?"}]
|
|||
|
|
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
|
|||
|
|
|
|||
|
|
outputs = model.generate(inputs, max_new_tokens=32768, temperature=1.0, top_p=0.95, top_k=20)
|
|||
|
|
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
Recommended sampling: `temperature=1.0`, `top_p=0.95`, `top_k=20`, `min_p=0`.
|
|||
|
|
|
|||
|
|
## Citation
|
|||
|
|
|
|||
|
|
If you find our work helpful, feel free to give us a cite.
|
|||
|
|
|
|||
|
|
```bibtex
|
|||
|
|
@misc{lasbordes2026rethinking,
|
|||
|
|
title = {Rethinking the Multilingual Reasoning Gap with Layer Swap},
|
|||
|
|
author = {Lasbordes, Maxence and Chatelain, Amélie and Seddah, Djamé},
|
|||
|
|
year = {2026},
|
|||
|
|
eprint = {2605.26735},
|
|||
|
|
archivePrefix= {arXiv},
|
|||
|
|
primaryClass = {cs.CL}
|
|||
|
|
}
|
|||
|
|
```
|