初始化项目,由ModelHub XC社区提供模型

Model: ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
Source: Original Platform
This commit is contained in:
ModelHub XC
2026-05-01 13:55:08 +08:00
commit e99626e13e
11 changed files with 868 additions and 0 deletions

226
README.md Normal file
View File

@@ -0,0 +1,226 @@
---
license: apache-2.0
language:
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- unsloth
- qwen3
- sft
- fine-tuned
- trl
- lora
- qlora
- text-generation
- reasoning
- conversational
base_model: unsloth/qwen3-8b-unsloth-bnb-4bit
datasets:
- ermiaazarkhalili/claude-reasoning-distillation
model-index:
- name: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
results: []
---
# Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
This model is a fine-tuned version of [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) optimized for **reasoning distillation (chain-of-thought)** using [Unsloth](https://github.com/unslothai/unsloth) for **2x faster training** and **60% less VRAM**.
Trained on the [claude-reasoning-distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset, which contains 10,477 samples of Claude's reasoning traces with `<think>` blocks for chain-of-thought learning.
## Overview
| Property | Value |
|----------|-------|
| **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) |
| **License** | APACHE-2.0 |
| **Language** | English |
| **Base Model** | [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) |
| **Model Size** | 8B parameters |
| **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) |
| **Training Method** | SFT with QLoRA (4-bit) |
| **Context Length** | 2,048 tokens |
| **GGUF Available** | [Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF) |
## Training Configuration
### SFT + LoRA Settings
| Parameter | Value |
|-----------|-------|
| Unsloth Class | `FastLanguageModel` |
| Chat Template | built-in Qwen3 |
| Learning Rate | 2e-4 |
| Batch Size | 1 per device |
| Gradient Accumulation | 8 steps |
| Effective Batch Size | 8 |
| Max Steps | 1 epoch (full dataset) |
| Optimizer | AdamW 8-bit |
| LR Scheduler | Linear |
| Warmup Steps | 5 |
| Precision | Auto (BF16/FP16) |
| Gradient Checkpointing | Enabled (Unsloth optimized) |
| Seed | 3407 |
### LoRA Configuration
| Parameter | Value |
|-----------|-------|
| LoRA Rank (r) | 16 |
| LoRA Alpha | 16 |
| LoRA Dropout | 0 |
| Quantization | 4-bit QLoRA |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
### Dataset
| Property | Value |
|----------|-------|
| Dataset | [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) |
| Training Samples | 10,477 |
| Format | Messages with `thinking` field for chain-of-thought |
### Hardware
| Property | Value |
|----------|-------|
| GPU | NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) |
| Cluster | DRAC Fir (Compute Canada) |
| Execution | [Papermill](https://github.com/nteract/papermill) on SLURM |
### Training Outcome
| Metric | Value |
|--------|-------|
| SLURM Job ID | `36885901` |
| Runtime | 40m 30s (2430s) |
| Final Training Loss | 0.8753 |
| Peak VRAM | 14.23 GB |
| GPU | H100 80GB HBM3 (MIG 3g.40gb) |
## Usage
### Quick Start (Transformers)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
```
### Using with Unsloth (Fastest)
```python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
"ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
max_seq_length=2048,
load_in_4bit=True,
)
```
### 4-bit Quantized Inference
```python
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
"ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
quantization_config=quantization_config,
device_map="auto",
)
```
## GGUF Versions
Quantized GGUF versions for CPU and edge inference are available at:
**[Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF)**
| Format | Description |
|--------|-------------|
| `Q4_K_M` | Recommended — good balance of quality and size |
| `Q5_K_M` | Higher quality, slightly larger |
| `Q8_0` | Near-lossless, largest GGUF size |
### Using with Ollama
```bash
ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?"
```
### Using with llama.cpp
```bash
./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512
```
## Limitations
- **Language**: Primarily trained on English data
- **Knowledge Cutoff**: Limited to base model's training data cutoff
- **Hallucinations**: May generate plausible-sounding but incorrect information
- **Context Length**: Fine-tuned with 2,048 token context window
- **Safety**: Not extensively safety-tuned; use with appropriate guardrails
## Training Framework Versions
| Package | Version |
|---------|---------|
| Unsloth | 2026.4.4 |
| TRL | 0.24.0 |
| Transformers | 5.5.0 |
| PyTorch | 2.9.0 |
| Datasets | 4.3.0 |
| PEFT | 0.18.1 |
| BitsAndBytes | 0.49.2 |
## Citation
```bibtex
@misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth,
author = {ermiaazarkhalili},
title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}}
}
```
## Acknowledgments
- [Unsloth](https://github.com/unslothai/unsloth) for 2x faster fine-tuning
- Base model developers (unsloth)
- [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library
- [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset
- [Compute Canada / DRAC](https://alliancecan.ca/) for HPC resources