初始化项目,由ModelHub XC社区提供模型
Model: ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth Source: Original Platform
This commit is contained in:
226
README.md
Normal file
226
README.md
Normal file
@@ -0,0 +1,226 @@
|
||||
---
|
||||
license: apache-2.0
|
||||
language:
|
||||
- en
|
||||
library_name: transformers
|
||||
pipeline_tag: text-generation
|
||||
tags:
|
||||
- unsloth
|
||||
- qwen3
|
||||
- sft
|
||||
- fine-tuned
|
||||
- trl
|
||||
- lora
|
||||
- qlora
|
||||
- text-generation
|
||||
- reasoning
|
||||
- conversational
|
||||
base_model: unsloth/qwen3-8b-unsloth-bnb-4bit
|
||||
datasets:
|
||||
- ermiaazarkhalili/claude-reasoning-distillation
|
||||
model-index:
|
||||
- name: Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
|
||||
results: []
|
||||
---
|
||||
|
||||
# Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth
|
||||
|
||||
This model is a fine-tuned version of [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) optimized for **reasoning distillation (chain-of-thought)** using [Unsloth](https://github.com/unslothai/unsloth) for **2x faster training** and **60% less VRAM**.
|
||||
|
||||
Trained on the [claude-reasoning-distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset, which contains 10,477 samples of Claude's reasoning traces with `<think>` blocks for chain-of-thought learning.
|
||||
|
||||
## Overview
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) |
|
||||
| **License** | APACHE-2.0 |
|
||||
| **Language** | English |
|
||||
| **Base Model** | [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) |
|
||||
| **Model Size** | 8B parameters |
|
||||
| **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) |
|
||||
| **Training Method** | SFT with QLoRA (4-bit) |
|
||||
| **Context Length** | 2,048 tokens |
|
||||
| **GGUF Available** | [Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF) |
|
||||
|
||||
## Training Configuration
|
||||
|
||||
### SFT + LoRA Settings
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| Unsloth Class | `FastLanguageModel` |
|
||||
| Chat Template | built-in Qwen3 |
|
||||
| Learning Rate | 2e-4 |
|
||||
| Batch Size | 1 per device |
|
||||
| Gradient Accumulation | 8 steps |
|
||||
| Effective Batch Size | 8 |
|
||||
| Max Steps | 1 epoch (full dataset) |
|
||||
| Optimizer | AdamW 8-bit |
|
||||
| LR Scheduler | Linear |
|
||||
| Warmup Steps | 5 |
|
||||
| Precision | Auto (BF16/FP16) |
|
||||
| Gradient Checkpointing | Enabled (Unsloth optimized) |
|
||||
| Seed | 3407 |
|
||||
|
||||
### LoRA Configuration
|
||||
|
||||
| Parameter | Value |
|
||||
|-----------|-------|
|
||||
| LoRA Rank (r) | 16 |
|
||||
| LoRA Alpha | 16 |
|
||||
| LoRA Dropout | 0 |
|
||||
| Quantization | 4-bit QLoRA |
|
||||
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
||||
|
||||
### Dataset
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| Dataset | [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) |
|
||||
| Training Samples | 10,477 |
|
||||
| Format | Messages with `thinking` field for chain-of-thought |
|
||||
|
||||
### Hardware
|
||||
|
||||
| Property | Value |
|
||||
|----------|-------|
|
||||
| GPU | NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) |
|
||||
| Cluster | DRAC Fir (Compute Canada) |
|
||||
| Execution | [Papermill](https://github.com/nteract/papermill) on SLURM |
|
||||
|
||||
### Training Outcome
|
||||
|
||||
| Metric | Value |
|
||||
|--------|-------|
|
||||
| SLURM Job ID | `36885901` |
|
||||
| Runtime | 40m 30s (2430s) |
|
||||
| Final Training Loss | 0.8753 |
|
||||
| Peak VRAM | 14.23 GB |
|
||||
| GPU | H100 80GB HBM3 (MIG 3g.40gb) |
|
||||
|
||||
## Usage
|
||||
|
||||
### Quick Start (Transformers)
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
import torch
|
||||
|
||||
model_id = "ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth"
|
||||
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_id,
|
||||
torch_dtype=torch.bfloat16,
|
||||
device_map="auto",
|
||||
)
|
||||
|
||||
messages = [
|
||||
{"role": "user", "content": "Solve step by step: What is the sum of the first 10 prime numbers?"}
|
||||
]
|
||||
|
||||
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
||||
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
||||
|
||||
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
|
||||
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
|
||||
print(response)
|
||||
```
|
||||
|
||||
### Using with Unsloth (Fastest)
|
||||
|
||||
```python
|
||||
from unsloth import FastLanguageModel
|
||||
|
||||
model, tokenizer = FastLanguageModel.from_pretrained(
|
||||
"ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
|
||||
max_seq_length=2048,
|
||||
load_in_4bit=True,
|
||||
)
|
||||
|
||||
```
|
||||
|
||||
### 4-bit Quantized Inference
|
||||
|
||||
```python
|
||||
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
||||
import torch
|
||||
|
||||
quantization_config = BitsAndBytesConfig(
|
||||
load_in_4bit=True,
|
||||
bnb_4bit_compute_dtype=torch.bfloat16,
|
||||
bnb_4bit_use_double_quant=True,
|
||||
bnb_4bit_quant_type="nf4",
|
||||
)
|
||||
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
"ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth",
|
||||
quantization_config=quantization_config,
|
||||
device_map="auto",
|
||||
)
|
||||
```
|
||||
|
||||
## GGUF Versions
|
||||
|
||||
Quantized GGUF versions for CPU and edge inference are available at:
|
||||
**[Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF)**
|
||||
|
||||
| Format | Description |
|
||||
|--------|-------------|
|
||||
| `Q4_K_M` | Recommended — good balance of quality and size |
|
||||
| `Q5_K_M` | Higher quality, slightly larger |
|
||||
| `Q8_0` | Near-lossless, largest GGUF size |
|
||||
|
||||
### Using with Ollama
|
||||
|
||||
```bash
|
||||
ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M
|
||||
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-GGUF:Q4_K_M "Solve step by step: What is the sum of the first 10 prime numbers?"
|
||||
```
|
||||
|
||||
### Using with llama.cpp
|
||||
|
||||
```bash
|
||||
./llama-cli -m Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth-Q4_K_M.gguf -p "Solve step by step: What is the sum of the first 10 prime numbers?" -n 512
|
||||
```
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Language**: Primarily trained on English data
|
||||
- **Knowledge Cutoff**: Limited to base model's training data cutoff
|
||||
- **Hallucinations**: May generate plausible-sounding but incorrect information
|
||||
- **Context Length**: Fine-tuned with 2,048 token context window
|
||||
- **Safety**: Not extensively safety-tuned; use with appropriate guardrails
|
||||
|
||||
## Training Framework Versions
|
||||
|
||||
| Package | Version |
|
||||
|---------|---------|
|
||||
| Unsloth | 2026.4.4 |
|
||||
| TRL | 0.24.0 |
|
||||
| Transformers | 5.5.0 |
|
||||
| PyTorch | 2.9.0 |
|
||||
| Datasets | 4.3.0 |
|
||||
| PEFT | 0.18.1 |
|
||||
| BitsAndBytes | 0.49.2 |
|
||||
|
||||
## Citation
|
||||
|
||||
```bibtex
|
||||
@misc{ermiaazarkhalili_qwen3_8b_sft_claude_opus_reasoning_unsloth,
|
||||
author = {ermiaazarkhalili},
|
||||
title = {Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
|
||||
year = {2026},
|
||||
publisher = {Hugging Face},
|
||||
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-SFT-Claude-Opus-Reasoning-Unsloth}}
|
||||
}
|
||||
```
|
||||
|
||||
## Acknowledgments
|
||||
|
||||
- [Unsloth](https://github.com/unslothai/unsloth) for 2x faster fine-tuning
|
||||
- Base model developers (unsloth)
|
||||
- [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library
|
||||
- [Claude Reasoning Distillation](https://huggingface.co/datasets/ermiaazarkhalili/claude-reasoning-distillation) dataset
|
||||
- [Compute Canada / DRAC](https://alliancecan.ca/) for HPC resources
|
||||
Reference in New Issue
Block a user