---
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - unsloth
  - qwen3
  - sft
  - fine-tuned
  - trl
  - lora
  - qlora
  - text-generation
  - reasoning
  - conversational
base_model: unsloth/qwen3-8b-unsloth-bnb-4bit
datasets:
  - ermiaazarkhalili/claude-reasoning-distillation
model-index:
  - name: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
    results: []
---

# Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth

This model is a fine-tuned version of [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) optimized for **reasoning distillation (chain-of-thought)** using [Unsloth](https://github.com/unslothai/unsloth) for **2x faster training** and **60% less VRAM**.

Trained on the [claude-reasoning-distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset, which contains 10,477 samples of Claude's reasoning traces with `<think>` blocks for chain-of-thought learning.

## Overview

| Property | Value |
|----------|-------|
| **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) |
| **License** | APACHE-2.0 |
| **Language** | English |
| **Base Model** | [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) |
| **Model Size** | 8B parameters |
| **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) |
| **Training Method** | SFT with QLoRA (4-bit) |
| **Context Length** | 2,048 tokens |
| **GGUF Available** | [Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF) |

## Training Configuration

### SFT + LoRA Settings

| Parameter | Value |
|-----------|-------|
| Unsloth Class | `FastLanguageModel` |
| Chat Template | built-in Qwen3 |
| Learning Rate | 2e-4 |
| Batch Size | 1 per device |
| Gradient Accumulation | 8 steps |
| Effective Batch Size | 8 |
| Max Steps | 1 epoch (full dataset) |
| Optimizer | AdamW 8-bit |
| LR Scheduler | Linear |
| Warmup Steps | 5 |
| Precision | Auto (BF16/FP16) |
| Gradient Checkpointing | Enabled (Unsloth optimized) |
| Seed | 3407 |

### LoRA Configuration

| Parameter | Value |
|-----------|-------|
| LoRA Rank (r) | 16 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0 |
| Quantization | 4-bit QLoRA |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |

### Dataset

| Property | Value |
|----------|-------|
| Dataset | [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) |
| Training Samples | 10,477 |
| Format | Messages with `thinking` field for chain-of-thought |

### Hardware

| Property | Value |
|----------|-------|
| GPU | NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) |
| Cluster | DRAC Fir (Compute Canada) |
| Execution | [Papermill](https://github.com/nteract/papermill) on SLURM |

### Training Outcome

| Metric | Value |
|--------|-------|
| SLURM Job ID | `36885901` |
| Runtime | 40m 30s (2430s) |
| Final Training Loss | 0.8753 |
| Peak VRAM | 14.23 GB |
| GPU | H100 80GB HBM3 (MIG 3g.40gb) |

## Usage

### Quick Start (Transformers)

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```

### Using with Unsloth (Fastest)

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
    max_seq_length=2048,
    load_in_4bit=True,
)

```

### 4-bit Quantized Inference

```python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
    quantization_config=quantization_config,
    device_map="auto",
)
```

## GGUF Versions

Quantized GGUF versions for CPU and edge inference are available at:
**[Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF)**

| Format | Description |
|--------|-------------|
| `Q4_K_M` | Recommended — good balance of quality and size |
| `Q5_K_M` | Higher quality, slightly larger |
| `Q8_0` | Near-lossless, largest GGUF size |

### Using with Ollama

```bash
ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?"
```

### Using with llama.cpp

```bash
./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512
```

## Limitations

- **Language**: Primarily trained on English data
- **Knowledge Cutoff**: Limited to base model's training data cutoff
- **Hallucinations**: May generate plausible-sounding but incorrect information
- **Context Length**: Fine-tuned with 2,048 token context window
- **Safety**: Not extensively safety-tuned; use with appropriate guardrails

## Training Framework Versions

| Package | Version |
|---------|---------|
| Unsloth | 2026.4.4 |
| TRL | 0.24.0 |
| Transformers | 5.5.0 |
| PyTorch | 2.9.0 |
| Datasets | 4.3.0 |
| PEFT | 0.18.1 |
| BitsAndBytes | 0.49.2 |

## Citation

```bibtex
@misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth,
    author = {ermiaazarkhalili},
    title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}}
}
```

## Acknowledgments

- [Unsloth](https://github.com/unslothai/unsloth) for 2x faster fine-tuning
- Base model developers (unsloth)
- [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library
- [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset
- [Compute Canada / DRAC](https://alliancecan.ca/) for HPC resources