--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation tags: - unsloth - qwen3 - sft - fine-tuned - trl - lora - qlora - text-generation - reasoning - conversational base_model: unsloth/qwen3-8b-unsloth-bnb-4bit datasets: - ermiaazarkhalili/claude-reasoning-distillation model-index: - name: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth results: [] --- # Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth This model is a fine-tuned version of [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) optimized for **reasoning distillation (chain-of-thought)** using [Unsloth](https://github.com/unslothai/unsloth) for **2x faster training** and **60% less VRAM**. Trained on the [claude-reasoning-distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset, which contains 10,477 samples of Claude's reasoning traces with `` blocks for chain-of-thought learning. ## Overview | Property | Value | |----------|-------| | **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) | | **License** | APACHE-2.0 | | **Language** | English | | **Base Model** | [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) | | **Model Size** | 8B parameters | | **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) | | **Training Method** | SFT with QLoRA (4-bit) | | **Context Length** | 2,048 tokens | | **GGUF Available** | [Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF) | ## Training Configuration ### SFT + LoRA Settings | Parameter | Value | |-----------|-------| | Unsloth Class | `FastLanguageModel` | | Chat Template | built-in Qwen3 | | Learning Rate | 2e-4 | | Batch Size | 1 per device | | Gradient Accumulation | 8 steps | | Effective Batch Size | 8 | | Max Steps | 1 epoch (full dataset) | | Optimizer | AdamW 8-bit | | LR Scheduler | Linear | | Warmup Steps | 5 | | Precision | Auto (BF16/FP16) | | Gradient Checkpointing | Enabled (Unsloth optimized) | | Seed | 3407 | ### LoRA Configuration | Parameter | Value | |-----------|-------| | LoRA Rank (r) | 16 | | LoRA Alpha | 16 | | LoRA Dropout | 0 | | Quantization | 4-bit QLoRA | | Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | ### Dataset | Property | Value | |----------|-------| | Dataset | [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) | | Training Samples | 10,477 | | Format | Messages with `thinking` field for chain-of-thought | ### Hardware | Property | Value | |----------|-------| | GPU | NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) | | Cluster | DRAC Fir (Compute Canada) | | Execution | [Papermill](https://github.com/nteract/papermill) on SLURM | ### Training Outcome | Metric | Value | |--------|-------| | SLURM Job ID | `36885901` | | Runtime | 40m 30s (2430s) | | Final Training Loss | 0.8753 | | Peak VRAM | 14.23 GB | | GPU | H100 80GB HBM3 (MIG 3g.40gb) | ## Usage ### Quick Start (Transformers) ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [ {"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True) response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True) print(response) ``` ### Using with Unsloth (Fastest) ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth", max_seq_length=2048, load_in_4bit=True, ) ``` ### 4-bit Quantized Inference ```python from transformers import AutoModelForCausalLM, BitsAndBytesConfig import torch quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", ) model = AutoModelForCausalLM.from_pretrained( "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth", quantization_config=quantization_config, device_map="auto", ) ``` ## GGUF Versions Quantized GGUF versions for CPU and edge inference are available at: **[Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF)** | Format | Description | |--------|-------------| | `Q4_K_M` | Recommended — good balance of quality and size | | `Q5_K_M` | Higher quality, slightly larger | | `Q8_0` | Near-lossless, largest GGUF size | ### Using with Ollama ```bash ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?" ``` ### Using with llama.cpp ```bash ./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512 ``` ## Limitations - **Language**: Primarily trained on English data - **Knowledge Cutoff**: Limited to base model's training data cutoff - **Hallucinations**: May generate plausible-sounding but incorrect information - **Context Length**: Fine-tuned with 2,048 token context window - **Safety**: Not extensively safety-tuned; use with appropriate guardrails ## Training Framework Versions | Package | Version | |---------|---------| | Unsloth | 2026.4.4 | | TRL | 0.24.0 | | Transformers | 5.5.0 | | PyTorch | 2.9.0 | | Datasets | 4.3.0 | | PEFT | 0.18.1 | | BitsAndBytes | 0.49.2 | ## Citation ```bibtex @misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth, author = {ermiaazarkhalili}, title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}} } ``` ## Acknowledgments - [Unsloth](https://github.com/unslothai/unsloth) for 2x faster fine-tuning - Base model developers (unsloth) - [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library - [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset - [Compute Canada / DRAC](https://alliancecan.ca/) for HPC resources