初始化项目，由ModelHub XC社区提供模型

Model: ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth Source: Original Platform
2026-05-01 13:55:08 +08:00
commit e99626e13e
11 changed files with 868 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,226 @@
+---
+license: apache-2.0
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - unsloth
+  - qwen3
+  - sft
+  - fine-tuned
+  - trl
+  - lora
+  - qlora
+  - text-generation
+  - reasoning
+  - conversational
+base_model: unsloth/qwen3-8b-unsloth-bnb-4bit
+datasets:
+  - ermiaazarkhalili/claude-reasoning-distillation
+model-index:
+  - name: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
+    results: []
+---
+
+# Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
+
+This model is a fine-tuned version of [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) optimized for **reasoning distillation (chain-of-thought)** using [Unsloth](https://github.com/unslothai/unsloth) for **2x faster training** and **60% less VRAM**.
+
+Trained on the [claude-reasoning-distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset, which contains 10,477 samples of Claude's reasoning traces with `<think>` blocks for chain-of-thought learning.
+
+## Overview
+
+| Property | Value |
+|----------|-------|
+| **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) |
+| **License** | APACHE-2.0 |
+| **Language** | English |
+| **Base Model** | [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) |
+| **Model Size** | 8B parameters |
+| **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) |
+| **Training Method** | SFT with QLoRA (4-bit) |
+| **Context Length** | 2,048 tokens |
+| **GGUF Available** | [Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF) |
+
+## Training Configuration
+
+### SFT + LoRA Settings
+
+| Parameter | Value |
+|-----------|-------|
+| Unsloth Class | `FastLanguageModel` |
+| Chat Template | built-in Qwen3 |
+| Learning Rate | 2e-4 |
+| Batch Size | 1 per device |
+| Gradient Accumulation | 8 steps |
+| Effective Batch Size | 8 |
+| Max Steps | 1 epoch (full dataset) |
+| Optimizer | AdamW 8-bit |
+| LR Scheduler | Linear |
+| Warmup Steps | 5 |
+| Precision | Auto (BF16/FP16) |
+| Gradient Checkpointing | Enabled (Unsloth optimized) |
+| Seed | 3407 |
+
+### LoRA Configuration
+
+| Parameter | Value |
+|-----------|-------|
+| LoRA Rank (r) | 16 |
+| LoRA Alpha | 16 |
+| LoRA Dropout | 0 |
+| Quantization | 4-bit QLoRA |
+| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+
+### Dataset
+
+| Property | Value |
+|----------|-------|
+| Dataset | [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) |
+| Training Samples | 10,477 |
+| Format | Messages with `thinking` field for chain-of-thought |
+
+### Hardware
+
+| Property | Value |
+|----------|-------|
+| GPU | NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) |
+| Cluster | DRAC Fir (Compute Canada) |
+| Execution | [Papermill](https://github.com/nteract/papermill) on SLURM |
+
+### Training Outcome
+
+| Metric | Value |
+|--------|-------|
+| SLURM Job ID | `36885901` |
+| Runtime | 40m 30s (2430s) |
+| Final Training Loss | 0.8753 |
+| Peak VRAM | 14.23 GB |
+| GPU | H100 80GB HBM3 (MIG 3g.40gb) |
+
+## Usage
+
+### Quick Start (Transformers)
+
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+
+model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth"
+
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+
+messages = [
+    {"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"}
+]
+
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(text, return_tensors="pt").to(model.device)
+
+outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
+response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
+print(response)
+```
+
+### Using with Unsloth (Fastest)
+
+```python
+from unsloth import FastLanguageModel
+
+model, tokenizer = FastLanguageModel.from_pretrained(
+    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
+    max_seq_length=2048,
+    load_in_4bit=True,
+)
+
+```
+
+### 4-bit Quantized Inference
+
+```python
+from transformers import AutoModelForCausalLM, BitsAndBytesConfig
+import torch
+
+quantization_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_compute_dtype=torch.bfloat16,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+)
+
+model = AutoModelForCausalLM.from_pretrained(
+    "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
+    quantization_config=quantization_config,
+    device_map="auto",
+)
+```
+
+## GGUF Versions
+
+Quantized GGUF versions for CPU and edge inference are available at:
+**[Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF)**
+
+| Format | Description |
+|--------|-------------|
+| `Q4_K_M` | Recommended — good balance of quality and size |
+| `Q5_K_M` | Higher quality, slightly larger |
+| `Q8_0` | Near-lossless, largest GGUF size |
+
+### Using with Ollama
+
+```bash
+ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M
+ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?"
+```
+
+### Using with llama.cpp
+
+```bash
+./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512
+```
+
+## Limitations
+
+- **Language**: Primarily trained on English data
+- **Knowledge Cutoff**: Limited to base model's training data cutoff
+- **Hallucinations**: May generate plausible-sounding but incorrect information
+- **Context Length**: Fine-tuned with 2,048 token context window
+- **Safety**: Not extensively safety-tuned; use with appropriate guardrails
+
+## Training Framework Versions
+
+| Package | Version |
+|---------|---------|
+| Unsloth | 2026.4.4 |
+| TRL | 0.24.0 |
+| Transformers | 5.5.0 |
+| PyTorch | 2.9.0 |
+| Datasets | 4.3.0 |
+| PEFT | 0.18.1 |
+| BitsAndBytes | 0.49.2 |
+
+## Citation
+
+```bibtex
+@misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth,
+    author = {ermiaazarkhalili},
+    title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
+    year = {2026},
+    publisher = {Hugging Face},
+    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}}
+}
+```
+
+## Acknowledgments
+
+- [Unsloth](https://github.com/unslothai/unsloth) for 2x faster fine-tuning
+- Base model developers (unsloth)
+- [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library
+- [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset
+- [Compute Canada / DRAC](https://alliancecan.ca/) for HPC resources