初始化项目，由ModelHub XC社区提供模型

Model: bknyaz/Qwen3-0.6B-Fr Source: Original Platform
2026-05-30 01:17:43 +08:00
commit 2ac68cd76c
12 changed files with 151980 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,90 @@
+---
+library_name: transformers
+tags:
+- small-lm
+- math
+- reasoning
+- slm
+- french
+license: apache-2.0
+datasets:
+- openai/gsm8k
+- kurakurai/luth-sft
+- cmh/gsm8k_fr
+base_model:
+- Qwen/Qwen3-0.6B
+language:
+- fr
+- en
+---
+
+# Qwen3-0.6B-Fr
+
+This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the [kurakurai/luth-sft](https://huggingface.co/datasets/kurakurai/luth-sft) dataset, specifically
+subsets luth_smoltalk2, luth_aya_dataset, luth_croissantllm and luth_tulu3_persona_instruct.
+The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/. 
+Single A100 was used for fine-tuning and evaluation.
+
+The following versions were used for train/eval:
+
+- python >= 3.10
+- torch               : 2.9.0+cu128
+- lm_eval             : 0.4.9.1
+- vllm                : 0.11.1
+- transformers        : 4.57.6
+- datasets            : 3.2.0
+- numpy               : 2.2.6
+
+## Training
+
+The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:
+
+```bash
+python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name kurakurai/luth-sft --dataset_config main --learning_rate 2e-5 \
+--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy no \
+--completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
+```
+
+This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.
+
+The dataset was preprocessed to the conversational format:
+
+```python
+# trl/scripts/sft.py
+
+dataset = load_dataset(...)
+
+trainer = SFTTrainer(
+    model=model,
+    args=training_args,
+    train_dataset=concatenate_datasets([dataset['luth_smoltalk2'], dataset['luth_aya_dataset'], dataset['luth_croissantllm'], dataset['luth_tulu3_persona_instruct']]),
+    eval_dataset=dataset[script_args.dataset_test_split] if training_args.eval_strategy != "no" else None,
+    peft_config=get_peft_config(model_args),
+)
+
+```
+
+## Evaluation
+
+Evaluation was done with lm_eval on the test split of [gsm8k](https://huggingface.co/datasets/openai/gsm8k), 
+[french_bench (avg score)](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/french_bench) and [gsm8k-fr](https://huggingface.co/datasets/cmh/gsm8k_fr):
+
+```bash
+python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
+ --tasks gsm8k,french_bench,gsm8k-fr --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
+```
+
+To evaluate on gsm8k-fr you can use our fork https://github.com/bknyaz/lm-evaluation-harness/tree/main/lm_eval/tasks/gsm8k. 
+
+### Results
+
+| Model                 | gsm8k| french | gsm8k-fr | avg  |
+|-----------------------|------|--------|----------|------|
+| Qwen3-0.6B            | 21.0 | 24.4   | 19.6     | 21.7 |
+| Qwen3-0.6B-Fr         | 36.1 | 26.5   | 26.5     | 29.7 |
+
+
+
+## License
+
+Please refer to the license of the original model [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) and dataset [kurakurai/luth-sft](https://huggingface.co/datasets/kurakurai/luth-sft).