初始化项目，由ModelHub XC社区提供模型

Model: bknyaz/Qwen3-0.6B-Math Source: Original Platform
2026-06-04 13:11:55 +08:00
commit 0fde4bfc20
12 changed files with 151969 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,79 @@
+---
+library_name: transformers
+tags:
+- small-lm
+- math
+- reasoning
+- slm
+license: apache-2.0
+datasets:
+- openai/gsm8k
+base_model:
+- Qwen/Qwen3-0.6B
+---
+
+# Qwen3-0.6B-Math
+
+This model is obtained by fine-tuning Qwen/Qwen3-0.6B on the [gsm8k](https://huggingface.co/datasets/openai/gsm8k) train split. 
+The model is used in the experiments described in https://bknyaz.github.io/blog/2026/meta-merge/. 
+Single A100 was used for fine-tuning and evaluation.
+
+The following versions were used for train/eval:
+
+- python >= 3.10
+- torch               : 2.9.0+cu128
+- lm_eval             : 0.4.9.1
+- vllm                : 0.11.1
+- transformers        : 4.57.6
+- datasets            : 3.2.0
+- numpy               : 2.2.6
+
+## Training
+
+The [TRL](https://github.com/huggingface/trl) library was used with SFT/full-rank options:
+
+```bash
+python trl/scripts/sft.py --model_name_or_path Qwen/Qwen3-0.6B --dataset_name openai/gsm8k --dataset_config main --learning_rate 2e-5 \
+--num_train_epochs 1 --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --eos_token '<|im_end|>' --eval_strategy steps \
+--eval_steps 100 --completion_only_loss True --report_to wandb --output_dir /path/to/the/finetuned/model
+```
+
+This is by far not the most compute and performance efficient fine-tuning, but it could be a good baseline.
+
+The dataset was preprocessed to the conversational format:
+
+```python
+# trl/scripts/sft.py
+
+dataset = load_dataset(...)
+
+def preprocess_function(example):
+  return {
+  "prompt": [{"role": "user", "content": example["question"]}],
+  "completion": [
+      {"role": "assistant", "content": example['answer']}
+  ],
+  }
+
+dataset = dataset.map(preprocess_function)
+```
+
+## Evaluation
+
+Evaluation was done with lm_eval on the test split of gsm8k:
+
+```bash
+python -m lm_eval --model vllm --model_args pretrained=${model},tensor_parallel_size=1,dtype=auto,gpu_memory_utilization=0.9,data_parallel_size=1 \
+ --tasks gsm8k --batch_size 1 --apply_chat_template=True --confirm_run_unsafe_code --trust_remote_code
+```
+
+### Results
+
+| Model                 | gsm8k|
+|-----------------------|------|
+| Qwen3-0.6B            | 21.0 |
+| Qwen3-0.6B-Math       | 46.3 |
+
+## License
+
+Please refer to the license of the original model [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) and dataset [gsm8k](https://huggingface.co/datasets/openai/gsm8k).