Model: ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth Source: Original Platform
license, language, library_name, pipeline_tag, tags, base_model, datasets, model-index
| license | language | library_name | pipeline_tag | tags | base_model | datasets | model-index | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| apache-2.0 |
|
transformers | text-generation |
|
unsloth/qwen3-8b-unsloth-bnb-4bit |
|
|
Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
This model is a fine-tuned version of Qwen3-8B (Unsloth 4-bit) optimized for reasoning distillation (chain-of-thought) using Unsloth for 2x faster training and 60% less VRAM.
Trained on the claude-reasoning-distillation dataset, which contains 10,477 samples of Claude's reasoning traces with <think> blocks for chain-of-thought learning.
Overview
| Property | Value |
|---|---|
| Developed by | ermiaazarkhalili |
| License | APACHE-2.0 |
| Language | English |
| Base Model | Qwen3-8B (Unsloth 4-bit) |
| Model Size | 8B parameters |
| Training Framework | Unsloth + TRL |
| Training Method | SFT with QLoRA (4-bit) |
| Context Length | 2,048 tokens |
| GGUF Available | Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF |
Training Configuration
SFT + LoRA Settings
| Parameter | Value |
|---|---|
| Unsloth Class | FastLanguageModel |
| Chat Template | built-in Qwen3 |
| Learning Rate | 2e-4 |
| Batch Size | 1 per device |
| Gradient Accumulation | 8 steps |
| Effective Batch Size | 8 |
| Max Steps | 1 epoch (full dataset) |
| Optimizer | AdamW 8-bit |
| LR Scheduler | Linear |
| Warmup Steps | 5 |
| Precision | Auto (BF16/FP16) |
| Gradient Checkpointing | Enabled (Unsloth optimized) |
| Seed | 3407 |
LoRA Configuration
| Parameter | Value |
|---|---|
| LoRA Rank (r) | 16 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0 |
| Quantization | 4-bit QLoRA |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
Dataset
| Property | Value |
|---|---|
| Dataset | Claude Reasoning Distillation |
| Training Samples | 10,477 |
| Format | Messages with thinking field for chain-of-thought |
Hardware
| Property | Value |
|---|---|
| GPU | NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) |
| Cluster | DRAC Fir (Compute Canada) |
| Execution | Papermill on SLURM |
Training Outcome
| Metric | Value |
|---|---|
| SLURM Job ID | 36885901 |
| Runtime | 40m 30s (2430s) |
| Final Training Loss | 0.8753 |
| Peak VRAM | 14.23 GB |
| GPU | H100 80GB HBM3 (MIG 3g.40gb) |
Usage
Quick Start (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
Using with Unsloth (Fastest)
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
max_seq_length=2048,
load_in_4bit=True,
)
4-bit Quantized Inference
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
quantization_config=quantization_config,
device_map="auto",
)
GGUF Versions
Quantized GGUF versions for CPU and edge inference are available at: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF
| Format | Description |
|---|---|
Q4_K_M |
Recommended — good balance of quality and size |
Q5_K_M |
Higher quality, slightly larger |
Q8_0 |
Near-lossless, largest GGUF size |
Using with Ollama
ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?"
Using with llama.cpp
./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512
Limitations
- Language: Primarily trained on English data
- Knowledge Cutoff: Limited to base model's training data cutoff
- Hallucinations: May generate plausible-sounding but incorrect information
- Context Length: Fine-tuned with 2,048 token context window
- Safety: Not extensively safety-tuned; use with appropriate guardrails
Training Framework Versions
| Package | Version |
|---|---|
| Unsloth | 2026.4.4 |
| TRL | 0.24.0 |
| Transformers | 5.5.0 |
| PyTorch | 2.9.0 |
| Datasets | 4.3.0 |
| PEFT | 0.18.1 |
| BitsAndBytes | 0.49.2 |
Citation
@misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth,
author = {ermiaazarkhalili},
title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}}
}
Acknowledgments
- Unsloth for 2x faster fine-tuning
- Base model developers (unsloth)
- Hugging Face TRL Team for the training library
- Claude Reasoning Distillation dataset
- Compute Canada / DRAC for HPC resources
Description
Languages
Jinja
100%