license, language, library_name, pipeline_tag, tags, base_model, datasets, model-index
license language library_name pipeline_tag tags base_model datasets model-index
apache-2.0
en
transformers text-generation
unsloth
qwen3
sft
fine-tuned
trl
lora
qlora
text-generation
reasoning
conversational
unsloth/qwen3-8b-unsloth-bnb-4bit
ermiaazarkhalili/claude-reasoning-distillation
name results
Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth

Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth

This model is a fine-tuned version of Qwen3-8B (Unsloth 4-bit) optimized for reasoning distillation (chain-of-thought) using Unsloth for 2x faster training and 60% less VRAM.

Trained on the claude-reasoning-distillation dataset, which contains 10,477 samples of Claude's reasoning traces with <think> blocks for chain-of-thought learning.

Overview

Property Value
Developed by ermiaazarkhalili
License APACHE-2.0
Language English
Base Model Qwen3-8B (Unsloth 4-bit)
Model Size 8B parameters
Training Framework Unsloth + TRL
Training Method SFT with QLoRA (4-bit)
Context Length 2,048 tokens
GGUF Available Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF

Training Configuration

SFT + LoRA Settings

Parameter Value
Unsloth Class FastLanguageModel
Chat Template built-in Qwen3
Learning Rate 2e-4
Batch Size 1 per device
Gradient Accumulation 8 steps
Effective Batch Size 8
Max Steps 1 epoch (full dataset)
Optimizer AdamW 8-bit
LR Scheduler Linear
Warmup Steps 5
Precision Auto (BF16/FP16)
Gradient Checkpointing Enabled (Unsloth optimized)
Seed 3407

LoRA Configuration

Parameter Value
LoRA Rank (r) 16
LoRA Alpha 16
LoRA Dropout 0
Quantization 4-bit QLoRA
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Dataset

Property Value
Dataset Claude Reasoning Distillation
Training Samples 10,477
Format Messages with thinking field for chain-of-thought

Hardware

Property Value
GPU NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice)
Cluster DRAC Fir (Compute Canada)
Execution Papermill on SLURM

Training Outcome

Metric Value
SLURM Job ID 36885901
Runtime 40m 30s (2430s)
Final Training Loss 0.8753
Peak VRAM 14.23 GB
GPU H100 80GB HBM3 (MIG 3g.40gb)

Usage

Quick Start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Using with Unsloth (Fastest)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
    max_seq_length=2048,
    load_in_4bit=True,
)

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
    quantization_config=quantization_config,
    device_map="auto",
)

GGUF Versions

Quantized GGUF versions for CPU and edge inference are available at: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF

Format Description
Q4_K_M Recommended — good balance of quality and size
Q5_K_M Higher quality, slightly larger
Q8_0 Near-lossless, largest GGUF size

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?"

Using with llama.cpp

./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512

Limitations

  • Language: Primarily trained on English data
  • Knowledge Cutoff: Limited to base model's training data cutoff
  • Hallucinations: May generate plausible-sounding but incorrect information
  • Context Length: Fine-tuned with 2,048 token context window
  • Safety: Not extensively safety-tuned; use with appropriate guardrails

Training Framework Versions

Package Version
Unsloth 2026.4.4
TRL 0.24.0
Transformers 5.5.0
PyTorch 2.9.0
Datasets 4.3.0
PEFT 0.18.1
BitsAndBytes 0.49.2

Citation

@misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth,
    author = {ermiaazarkhalili},
    title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}}
}

Acknowledgments

Description
Model synced from source: ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
Readme 32 KiB
Languages
Jinja 100%