Files
Qwen3-8B-Function-Calling-x…/README.md
ModelHub XC 93176bf0a2 初始化项目,由ModelHub XC社区提供模型
Model: ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF
Source: Original Platform
2026-06-21 09:47:16 +08:00

107 lines
4.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
base_model: ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth
tags:
- gguf
- llama.cpp
- unsloth
- qwen3
- function-calling
- quantized
license: apache-2.0
language:
- en
datasets:
- Salesforce/xlam-function-calling-60k
pipeline_tag: text-generation
---
# Qwen3-8B-xLAM-Unsloth — GGUF quantized
GGUF quantizations of [`ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth`](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth),
produced via [Unsloth](https://github.com/unslothai/unsloth) + llama.cpp's conversion scripts.
| Field | Value |
|---|---|
| **Source checkpoint** | [`ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth`](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth) |
| **Base model** | [`unsloth/qwen3-8b-unsloth-bnb-4bit`](https://huggingface.co/unsloth/qwen3-8b-unsloth-bnb-4bit) |
| **Dataset** | [`Salesforce/xlam-function-calling-60k`](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) |
| **Training** | 1 full epoch (effective batch=8 via per_device=1 × grad_accum=8) |
| **Final training loss** | 0.219 (job 36885898, runtime 3h 48m on H100 MIG 3g.40gb) |
| **Conversion** | Unsloth `save_pretrained` → llama.cpp `convert_hf_to_gguf.py``llama-quantize` |
| **Quantization tool** | llama.cpp `llama-quantize` (cached toolchain) |
## Available quantizations
| File | Bits | Size | Notes |
|---|---|---|---|
| `qwen3-8b-function-calling-xlam-unsloth.q2_k.gguf` | 2-bit | 3.28 GB | Smallest; aggressive quality loss |
| `qwen3-8b-function-calling-xlam-unsloth.q3_k_m.gguf` | 3-bit | 4.12 GB | Small; noticeable quality loss |
| `qwen3-8b-function-calling-xlam-unsloth.q4_k_m.gguf` | 4-bit | 5.03 GB | **Recommended** — best size/quality balance |
| `qwen3-8b-function-calling-xlam-unsloth.q5_k_m.gguf` | 5-bit | 5.85 GB | Near-full quality |
| `qwen3-8b-function-calling-xlam-unsloth.q6_k.gguf` | 6-bit | 6.73 GB | Very close to Q8_0 at smaller size |
| `qwen3-8b-function-calling-xlam-unsloth.q8_0.gguf` | 8-bit | 8.71 GB | Largest; closest to bf16 source |
**Recommended default:** `Q4_K_M` (4-bit, K-quant medium). For maximum fidelity, use `Q8_0`.
## Usage
### llama.cpp
```bash
# Text-only
llama-cli -hf ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF --jinja -p "Find flights from SFO to NYC on December 25th" -n 256
# Interactive chat
llama-cli -hf ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF --jinja -cnv
```
### Ollama
```bash
ollama run hf.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF:Q4_K_M
```
### llama-cpp-python
```python
from llama_cpp import Llama
llm = Llama.from_pretrained(
repo_id="ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF",
filename="*q4_k_m.gguf",
n_ctx=2048,
)
out = llm.create_chat_completion(
messages=[{"role": "user", "content": "Find flights from SFO to NYC on December 25th"}],
max_tokens=256,
)
print(out["choices"][0]["message"]["content"])
```
## Intended use
For research and non-commercial experimentation only. Outputs should be independently verified before any downstream use.
## Limitations
- GGUF quantizations have unavoidable quality loss relative to the source bfloat16 checkpoint. Use `Q5_K_M`, `Q6_K`, or `Q8_0` for best fidelity.
- Inherits all limitations of the source merged checkpoint ([`ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth`](https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth)).
- Limited to the function schemas covered in the xLAM-60K training dataset; performance on novel APIs may degrade.
## Citation
```bibtex
@misc{ qwen3_8b_xlam_unsloth_2026_gguf ,
author = {Ermia Azarkhalili},
title = { Qwen3-8B-xLAM-Unsloth — GGUF quantized },
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen3-8B-Function-Calling-xLAM-Unsloth-GGUF}}
}
```
---
This qwen3 model was trained 2× faster with [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face's TRL library.
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)