license, tags, datasets, language, base_model, pipeline_tag, library_name
license tags datasets language base_model pipeline_tag library_name
apache-2.0
unsloth
trl
sft
instruction-following
reasoning
patrickfleith/instruction-freak-reasoning
en
Qwen/Qwen3-0.6B
text-generation transformers

Qwen3-0.6B-IF-Expert

This project performs full fine-tuning on the Qwen3-0.6B language model to enhance its instruction-following and reasoning capabilities. Training was conducted on the patrickfleith/instruction-freak-reasoning dataset using bfloat16 (bf16) precision for efficient optimization.

Training Procedure

  1. Dataset Preparation

    • The patrickfleith/instruction-freak-reasoning dataset was used.
    • Each example contains a complex instruction paired with an in-depth reasoning-based response.
    • Prompts were structured to encourage chain-of-thought style outputs when applicable.
  2. Model Loading and Configuration

    • Qwen3 base model weights were loaded via the unsloth library in bf16 precision.
    • All model layers were fully updated (full_finetuning=True) to effectively adapt the model to instruction understanding and stepwise response generation.
  3. Supervised Fine-Tuning

    • Fine-tuning was conducted using the Hugging Face TRL library with the Supervised Fine-Tuning (SFT) approach.
    • The model was trained to follow detailed instructions, reason logically, and generate structured responses.

Purpose and Outcome

  • The models ability to follow complex instructions and explain its reasoning process has been significantly enhanced.
  • It generates both coherent reasoning steps and conclusive answers, improving transparency and usability for instruction-based tasks.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Support

Buy Me A Coffee

Description
Model synced from source: suayptalha/Qwen3-0.6B-IF-Expert
Readme 2 MiB