Files
Qwen3-0.6B-IF-Expert/README.md
ModelHub XC c08027146f 初始化项目,由ModelHub XC社区提供模型
Model: suayptalha/Qwen3-0.6B-IF-Expert
Source: Original Platform
2026-05-06 21:02:57 +08:00

52 lines
2.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: apache-2.0
tags:
- unsloth
- trl
- sft
- instruction-following
- reasoning
datasets:
- patrickfleith/instruction-freak-reasoning
language:
- en
base_model:
- Qwen/Qwen3-0.6B
pipeline_tag: text-generation
library_name: transformers
---
# Qwen3-0.6B-IF-Expert
This project performs full fine-tuning on the **Qwen3-0.6B** language model to enhance its **instruction-following** and **reasoning** capabilities. Training was conducted on the `patrickfleith/instruction-freak-reasoning` dataset using bfloat16 (bf16) precision for efficient optimization.
## Training Procedure
1. **Dataset Preparation**
* The `patrickfleith/instruction-freak-reasoning` dataset was used.
* Each example contains a complex instruction paired with an in-depth reasoning-based response.
* Prompts were structured to encourage chain-of-thought style outputs when applicable.
2. **Model Loading and Configuration**
* Qwen3 base model weights were loaded via the `unsloth` library in bf16 precision.
* All model layers were fully updated (`full_finetuning=True`) to effectively adapt the model to instruction understanding and stepwise response generation.
3. **Supervised Fine-Tuning**
* Fine-tuning was conducted using the Hugging Face TRL library with the Supervised Fine-Tuning (SFT) approach.
* The model was trained to follow detailed instructions, reason logically, and generate structured responses.
## Purpose and Outcome
* The models ability to follow complex instructions and explain its reasoning process has been significantly enhanced.
* It generates both coherent reasoning steps and conclusive answers, improving transparency and usability for instruction-based tasks.
## License
This project is licensed under the Apache License 2.0. See the [LICENSE](./LICENSE) file for details.
## Support
<a href="https://www.buymeacoffee.com/suayptalha" target="_blank"><img src="https://cdn.buymeacoffee.com/buttons/v2/default-yellow.png" alt="Buy Me A Coffee" style="height: 60px !important;width: 217px !important;" ></a>