Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth

ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth

Go to file

ModelHub XC e99626e13e 初始化项目，由ModelHub XC社区提供模型

Model: ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
Source: Original Platform

2026-05-01 13:55:08 +08:00

.gitattributes

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

chat_template.jinja

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

model-00001-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

model-00002-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

model-00003-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

model-00004-of-00004.safetensors

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

model.safetensors.index.json

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

README.md

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

tokenizer_config.json

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

tokenizer.json

初始化项目，由ModelHub XC社区提供模型

2026-05-01 13:55:08 +08:00

README.md

license, language, library_name, pipeline_tag, tags, base_model, datasets, model-index

license

language

library_name

pipeline_tag

Overview

Property	Value
Developed by	ermiaazarkhalili
License	APACHE-2.0
Language	English
Base Model	Qwen3-8B (Unsloth 4-bit)
Model Size	8B parameters
Training Framework	Unsloth + TRL
Training Method	SFT with QLoRA (4-bit)
Context Length	2,048 tokens
GGUF Available	Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF

Training Configuration

SFT + LoRA Settings

Parameter	Value
Unsloth Class	`FastLanguageModel`
Chat Template	built-in Qwen3
Learning Rate	2e-4
Batch Size	1 per device
Gradient Accumulation	8 steps
Effective Batch Size	8
Max Steps	1 epoch (full dataset)
Optimizer	AdamW 8-bit
LR Scheduler	Linear
Warmup Steps	5
Precision	Auto (BF16/FP16)
Gradient Checkpointing	Enabled (Unsloth optimized)
Seed	3407

LoRA Configuration

Parameter	Value
LoRA Rank (r)	16
LoRA Alpha	16
LoRA Dropout	0
Quantization	4-bit QLoRA
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Dataset

Property	Value
Dataset	Claude Reasoning Distillation
Training Samples	10,477
Format	Messages with `thinking` field for chain-of-thought

Hardware

Property	Value
GPU	NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice)
Cluster	DRAC Fir (Compute Canada)
Execution	Papermill on SLURM

Training Outcome

Metric	Value
SLURM Job ID	`36885901`
Runtime	40m 30s (2430s)
Final Training Loss	0.8753
Peak VRAM	14.23 GB
GPU	H100 80GB HBM3 (MIG 3g.40gb)

Usage

Quick Start (Transformers)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Using with Unsloth (Fastest)

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
    max_seq_length=2048,
    load_in_4bit=True,
)

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
    quantization_config=quantization_config,
    device_map="auto",
)

GGUF Versions

Quantized GGUF versions for CPU and edge inference are available at: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF

Format	Description
`Q4_K_M`	Recommended — good balance of quality and size
`Q5_K_M`	Higher quality, slightly larger
`Q8_0`	Near-lossless, largest GGUF size

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?"

Using with llama.cpp

./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512

Limitations

Language: Primarily trained on English data
Knowledge Cutoff: Limited to base model's training data cutoff
Hallucinations: May generate plausible-sounding but incorrect information
Context Length: Fine-tuned with 2,048 token context window
Safety: Not extensively safety-tuned; use with appropriate guardrails

Training Framework Versions

Package	Version
Unsloth	2026.4.4
TRL	0.24.0
Transformers	5.5.0
PyTorch	2.9.0
Datasets	4.3.0
PEFT	0.18.1
BitsAndBytes	0.49.2

Citation

@misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth,
    author = {ermiaazarkhalili},
    title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
    year = {2026},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}}
}

Acknowledgments

Unsloth for 2x faster fine-tuning
Base model developers (unsloth)
Hugging Face TRL Team for the training library
Claude Reasoning Distillation dataset
Compute Canada / DRAC for HPC resources