227 lines
6.8 KiB
Markdown
227 lines
6.8 KiB
Markdown
---
|
|
license: apache-2.0
|
|
language:
|
|
- en
|
|
library_name: transformers
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- unsloth
|
|
- qwen3
|
|
- sft
|
|
- fine-tuned
|
|
- trl
|
|
- lora
|
|
- qlora
|
|
- text-generation
|
|
- function-calling
|
|
- conversational
|
|
base_model: unsloth/qwen3-8b-unsloth-bnb-4bit
|
|
datasets:
|
|
- Salesforce/xlam-function-calling-60k
|
|
model-index:
|
|
- name: Qwen3-8B-Function-Calling-xLAM-Unsloth
|
|
results: []
|
|
---
|
|
|
|
# Qwen3-8B-Function-Calling-xLAM-Unsloth
|
|
|
|
This model is a fine-tuned version of [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) optimized for **function calling** using [Unsloth](https://github.com/unslothai/unsloth) for **2x faster training** and **60% less VRAM**.
|
|
|
|
Trained on the [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) dataset, which contains 60,000 function calling examples with queries, tool definitions, and structured answers.
|
|
|
|
## Overview
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| **Developed by** | [ermiaazarkhalili](https://huggingface.co/ermiaazarkhalili) |
|
|
| **License** | APACHE-2.0 |
|
|
| **Language** | English |
|
|
| **Base Model** | [Qwen3-8B (Unsloth 4-bit)](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) |
|
|
| **Model Size** | 8B parameters |
|
|
| **Training Framework** | [Unsloth](https://github.com/unslothai/unsloth) + [TRL](https://github.com/huggingface/trl) |
|
|
| **Training Method** | SFT with QLoRA (4-bit) |
|
|
| **Context Length** | 2,048 tokens |
|
|
| **GGUF Available** | [Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF) |
|
|
|
|
## Training Configuration
|
|
|
|
### SFT + LoRA Settings
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| Unsloth Class | `FastLanguageModel` |
|
|
| Chat Template | built-in Qwen3 |
|
|
| Learning Rate | 2e-4 |
|
|
| Batch Size | 1 per device |
|
|
| Gradient Accumulation | 8 steps |
|
|
| Effective Batch Size | 8 |
|
|
| Max Steps | 1 epoch (full dataset) |
|
|
| Optimizer | AdamW 8-bit |
|
|
| LR Scheduler | Linear |
|
|
| Warmup Steps | 5 |
|
|
| Precision | Auto (BF16/FP16) |
|
|
| Gradient Checkpointing | Enabled (Unsloth optimized) |
|
|
| Seed | 3407 |
|
|
|
|
### LoRA Configuration
|
|
|
|
| Parameter | Value |
|
|
|-----------|-------|
|
|
| LoRA Rank (r) | 16 |
|
|
| LoRA Alpha | 16 |
|
|
| LoRA Dropout | 0 |
|
|
| Quantization | 4-bit QLoRA |
|
|
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
|
|
|
|
### Dataset
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| Dataset | [xLAM Function Calling 60K](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) |
|
|
| Training Samples | 60,000 |
|
|
| Format | XML-tagged: `<query>`, `<tools>`, `<answers>` |
|
|
|
|
### Hardware
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| GPU | NVIDIA H100 80GB HBM3 (MIG 3g.40gb slice) |
|
|
| Cluster | DRAC Fir (Compute Canada) |
|
|
| Execution | [Papermill](https://github.com/nteract/papermill) on SLURM |
|
|
|
|
### Training Outcome
|
|
|
|
| Metric | Value |
|
|
|--------|-------|
|
|
| SLURM Job ID | `36885898` |
|
|
| Runtime | 3h 48m 36s (13716s) |
|
|
| Final Training Loss | 0.2186 |
|
|
| Peak VRAM | 17.07 GB |
|
|
| GPU | H100 80GB HBM3 (MIG 3g.40gb) |
|
|
|
|
## Usage
|
|
|
|
### Quick Start (Transformers)
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
import torch
|
|
|
|
model_id = "ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth"
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
model_id,
|
|
torch_dtype=torch.bfloat16,
|
|
device_map="auto",
|
|
)
|
|
|
|
messages = [
|
|
{"role": "user", "content": "Check if the numbers 8 and 1233 are powers of two."}
|
|
]
|
|
|
|
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
|
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
|
|
|
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
|
|
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
|
|
print(response)
|
|
```
|
|
|
|
### Using with Unsloth (Fastest)
|
|
|
|
```python
|
|
from unsloth import FastLanguageModel
|
|
|
|
model, tokenizer = FastLanguageModel.from_pretrained(
|
|
"ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
|
|
max_seq_length=2048,
|
|
load_in_4bit=True,
|
|
)
|
|
|
|
```
|
|
|
|
### 4-bit Quantized Inference
|
|
|
|
```python
|
|
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
|
|
import torch
|
|
|
|
quantization_config = BitsAndBytesConfig(
|
|
load_in_4bit=True,
|
|
bnb_4bit_compute_dtype=torch.bfloat16,
|
|
bnb_4bit_use_double_quant=True,
|
|
bnb_4bit_quant_type="nf4",
|
|
)
|
|
|
|
model = AutoModelForCausalLM.from_pretrained(
|
|
"ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth",
|
|
quantization_config=quantization_config,
|
|
device_map="auto",
|
|
)
|
|
```
|
|
|
|
## GGUF Versions
|
|
|
|
Quantized GGUF versions for CPU and edge inference are available at:
|
|
**[Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF)**
|
|
|
|
| Format | Description |
|
|
|--------|-------------|
|
|
| `Q4_K_M` | Recommended — good balance of quality and size |
|
|
| `Q5_K_M` | Higher quality, slightly larger |
|
|
| `Q8_0` | Near-lossless, largest GGUF size |
|
|
|
|
### Using with Ollama
|
|
|
|
```bash
|
|
ollama pull hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
|
|
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M "Check if the numbers 8 and 1233 are powers of two."
|
|
```
|
|
|
|
### Using with llama.cpp
|
|
|
|
```bash
|
|
./llama-cli -m Qwen3-8B-Function-Calling-xLAM-Unsloth-Q4_K_M.gguf -p "Check if the numbers 8 and 1233 are powers of two." -n 512
|
|
```
|
|
|
|
## Limitations
|
|
|
|
- **Language**: Primarily trained on English data
|
|
- **Knowledge Cutoff**: Limited to base model's training data cutoff
|
|
- **Hallucinations**: May generate plausible-sounding but incorrect information
|
|
- **Context Length**: Fine-tuned with 2,048 token context window
|
|
- **Safety**: Not extensively safety-tuned; use with appropriate guardrails
|
|
|
|
## Training Framework Versions
|
|
|
|
| Package | Version |
|
|
|---------|---------|
|
|
| Unsloth | 2026.4.4 |
|
|
| TRL | 0.24.0 |
|
|
| Transformers | 5.5.0 |
|
|
| PyTorch | 2.9.0 |
|
|
| Datasets | 4.3.0 |
|
|
| PEFT | 0.18.1 |
|
|
| BitsAndBytes | 0.49.2 |
|
|
|
|
## Citation
|
|
|
|
```bibtex
|
|
@misc{ermiaazarkhalili_qwen3_8b_function_calling_xlam_unsloth,
|
|
author = {ermiaazarkhalili},
|
|
title = {Qwen3-8B-Function-Calling-xLAM-Unsloth: Fine-tuned Qwen3-8B (Unsloth 4-bit) with Unsloth},
|
|
year = {2026},
|
|
publisher = {Hugging Face},
|
|
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth}}
|
|
}
|
|
```
|
|
|
|
## Acknowledgments
|
|
|
|
- [Unsloth](https://github.com/unslothai/unsloth) for 2x faster fine-tuning
|
|
- Base model developers (unsloth)
|
|
- [Hugging Face TRL Team](https://github.com/huggingface/trl) for the training library
|
|
- [Salesforce xLAM](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) for the function calling dataset
|
|
- [Compute Canada / DRAC](https://alliancecan.ca/) for HPC resources
|