--- library_name: transformers base_model: Qwen/Qwen3-8B-Base tags: - multilingual - reasoning - LLM - qwen3 license: apache-2.0 datasets: - lightonai/Dolci-Think-SFT-32B-Multilingual language: - en pipeline_tag: text-generation --- # Qwen3-8B-EN `Qwen3-8B-EN` is a **native reasoning model** fine-tuned from [`Qwen/Qwen3-8B-Base`](https://huggingface.co/Qwen/Qwen3-8B-Base) to reason in English. This model produces its **entire reasoning trace in English** before delivering the final answer in English. It is released alongside the paper [**Rethinking the Multilingual Reasoning Gap with Layer Swap**](https://arxiv.org/abs/2605.26735). ## Model details - **Base model:** `Qwen/Qwen3-8B-Base` - **Language:** English (CoT and answer) - **Training:** Full SFT, ~10B tokens, 2 epochs - **Context length:** 32,768 tokens - **Dataset:** [`lightonai/Dolci-Think-SFT-32B-Multilingual`](https://huggingface.co/datasets/lightonai/Dolci-Think-SFT-32B-Multilingual) (English split). > [!NOTE] > The model was trained on data derived from `allenai/Dolci-Think-SFT-32B`, released under the ODC-BY-1.0 license. ## Evaluation All scores are mean accuracy (%) on the **English** version of each benchmark, with sample standard deviation across runs. AIME 24/25 is averaged over 30 runs; the others over 10 runs, using the recommended generation parameters. | Model | MGSM-Rev2 | Global-MMLU-Lite | GPQA-Diamond | AIME 24/25 | HumanEvalPlus | Average | |---|:---:|:---:|:---:|:---:|:---:|:---:| | `Qwen3-8B-EN` | 98.96 | 81.72 | 55.66 | 62.89 | 85.75 | 77.00 | **Benchmarks used:** - [`lightonai/gpqa_diamond_multilingual`](https://huggingface.co/datasets/lightonai/gpqa_diamond_multilingual) - [`lightonai/aime24_multilingual`](https://huggingface.co/datasets/lightonai/aime24_multilingual) - [`lightonai/aime25_multilingual`](https://huggingface.co/datasets/lightonai/aime25_multilingual) - [`lightonai/HumanEvalPlus_multilingual`](https://huggingface.co/datasets/lightonai/HumanEvalPlus_multilingual) - [`lightonai/mgsm-rev2`](https://huggingface.co/datasets/lightonai/mgsm-rev2) - [`CohereLabs/Global-MMLU-Lite`](https://huggingface.co/datasets/CohereLabs/Global-MMLU-Lite) ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "lightonai/Qwen3-8B-EN" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto") messages = [{"role": "user", "content": "Solve: 24 × 17 = ?"}] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device) outputs = model.generate(inputs, max_new_tokens=32768, temperature=1.0, top_p=0.95, top_k=20) print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True)) ``` Recommended sampling: `temperature=1.0`, `top_p=0.95`, `top_k=20`, `min_p=0`. ## Citation If you find our work helpful, feel free to give us a cite. ```bibtex @misc{lasbordes2026rethinking, title = {Rethinking the Multilingual Reasoning Gap with Layer Swap}, author = {Lasbordes, Maxence and Chatelain, Amélie and Seddah, Djamé}, year = {2026}, eprint = {2605.26735}, archivePrefix= {arXiv}, primaryClass = {cs.CL} } ```