Qwen3-0.6B-Fr/README.md

---
library_name: transformers
tags:
- small-lm
- math
- reasoning
- slm
- french
license: apache-2.0
datasets:
- openai/gsm8k
- kurakurai/luth-sft
- cmh/gsm8k_fr
base_model:
- Qwen/Qwen3-0.6B
language:
- fr
- en
---

# Qwen3-0.6B-Fr

This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the [kurakurai/luth-sft](https://huggingface.co/datasets/kurakurai/luth-sft) dataset, specifically
subsets luth_smoltalk2, luth_aya_dataset, luth_croissantllm and luth_tulu3_persona_instruct.
The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/. 
Single A100 was used for fine-tuning and evaluation.

The following versions were used for train/eval:

- python >= 3.10
- torch               : 2.9.0+cu128
- lm_eval             : 0.4.9.1
- vllm                : 0.11.1
- transformers        : 4.57.6
- datasets            : 3.2.0
- numpy               : 2.2.6

## Training

The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:

```bash
python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name kurakurai/luth-sft --dataset_config main --learning_rate 2e-5 \
--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy no \
--completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
```

This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.

The dataset was preprocessed to the conversational format:

```python
# trl/scripts/sft.py

dataset = load_dataset(...)

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=concatenate_datasets([dataset['luth_smoltalk2'], dataset['luth_aya_dataset'], dataset['luth_croissantllm'], dataset['luth_tulu3_persona_instruct']]),
    eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,
    peft_config=get_peft_config(model_args),
)

```

## Evaluation

Evaluation was done with lm_eval on the test split of [gsm8k](https://huggingface.co/datasets/openai/gsm8k), 
[french_bench (avg score)](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/french_bench) and [gsm8k-fr](https://huggingface.co/datasets/cmh/gsm8k_fr):

```bash
python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
 --tasks gsm8k,french_bench,gsm8k-fr --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
```

To evaluate on gsm8k-fr you can use our fork https://github.com/bknyaz/lm-evaluation-harness/tree/main/lm_eval/tasks/gsm8k. 

### Results

| Model                 | gsm8k| french | gsm8k-fr | avg  |
|-----------------------|------|--------|----------|------|
| Qwen3-0.6B            | 21.0 | 24.4   | 19.6     | 21.7 |
| Qwen3-0.6B-Fr         | 36.1 | 26.5   | 26.5     | 29.7 |


## License

Please refer to the license of the original model [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) and dataset [kurakurai/luth-sft](https://huggingface.co/datasets/kurakurai/luth-sft).
初始化项目，由ModelHub XC社区提供模型 Model: bknyaz/Qwen3-0.6B-Fr Source: Original Platform 2026-05-30 01:17:43 +08:00			`---`
			`library_name: transformers`
			`tags:`
			`- small-lm`
			`- math`
			`- reasoning`
			`- slm`
			`- french`
			`license: apache-2.0`
			`datasets:`
			`- openai/gsm8k`
			`- kurakurai/luth-sft`
			`- cmh/gsm8k_fr`
			`base_model:`
			`- Qwen/Qwen3-0.6B`
			`language:`
			`- fr`
			`- en`
			`---`

			`# Qwen3-0.6B-Fr`

			`This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the [kurakurai/luth-sft](https://huggingface.co/datasets/kurakurai/luth-sft) dataset, specifically`
			`subsets luth_smoltalk2, luth_aya_dataset, luth_croissantllm and luth_tulu3_persona_instruct.`
			`The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/.`
			`Single A100 was used for fine-tuning and evaluation.`

			`The following versions were used for train/eval:`

			`- python >= 3.10`
			`- torch : 2.9.0+cu128`
			`- lm_eval : 0.4.9.1`
			`- vllm : 0.11.1`
			`- transformers : 4.57.6`
			`- datasets : 3.2.0`
			`- numpy : 2.2.6`

			`## Training`

			`The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:`

			```bash
			`python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name kurakurai/luth-sft --dataset_config main --learning_rate 2e-5 \`
			`--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<\|im_end\|>' --eval_strategy no \`
			`--completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model`
			```

			`This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.`

			`The dataset was preprocessed to the conversational format:`

			```python
			`# trl/scripts/sft.py`

			`dataset = load_dataset(...)`

			`trainer = SFTTrainer(`
			`model=model,`
			`args=training_args,`
			`train_dataset=concatenate_datasets([dataset['luth_smoltalk2'], dataset['luth_aya_dataset'], dataset['luth_croissantllm'], dataset['luth_tulu3_persona_instruct']]),`
			`eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,`
			`peft_config=get_peft_config(model_args),`
			`)`

			```

			`## Evaluation`

			`Evaluation was done with lm_eval on the test split of [gsm8k](https://huggingface.co/datasets/openai/gsm8k),`
			`[french_bench (avg score)](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/french_bench) and [gsm8k-fr](https://huggingface.co/datasets/cmh/gsm8k_fr):`

			```bash
			`python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \`
			`--tasks gsm8k,french_bench,gsm8k-fr --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code`
			```

			`To evaluate on gsm8k-fr you can use our fork https://github.com/bknyaz/lm-evaluation-harness/tree/main/lm_eval/tasks/gsm8k.`

			`### Results`

			`\| Model \| gsm8k\| french \| gsm8k-fr \| avg \|`
			`\|-----------------------\|------\|--------\|----------\|------\|`
			`\| Qwen3-0.6B \| 21.0 \| 24.4 \| 19.6 \| 21.7 \|`
			`\| Qwen3-0.6B-Fr \| 36.1 \| 26.5 \| 26.5 \| 29.7 \|`



			`## License`

			`Please refer to the license of the original model [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) and dataset [kurakurai/luth-sft](https://huggingface.co/datasets/kurakurai/luth-sft).`