--- language: - en license: apache-2.0 base_model: Featherlabs/Aura-7b tags: - gguf - qwen2 - agentic - function-calling - tool-use - conversational - featherlabs - llama-cpp - ollama - lm-studio pipeline_tag: text-generation ---
# ⚡ Aura-7b GGUF ### *A small model that punches above its weight — Now optimized for local inference* **Agentic · Tool Use · Function Calling · Reasoning** [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Base Model](https://img.shields.io/badge/Base-Featherlabs/Aura--7b-purple)](https://huggingface.co/Featherlabs/Aura-7b) [![Quantization](https://img.shields.io/badge/Format-GGUF-orange)](#) *Built by [Featherlabs](https://huggingface.co/Featherlabs) · Operated by Owlkun*
--- ## ✨ Overview This repository contains **GGUF quantized versions** of **[Featherlabs/Aura-7b](https://huggingface.co/Featherlabs/Aura-7b)** — an agentic 7B language model fine-tuned on Qwen2.5-7B-Instruct by [Featherlabs](https://huggingface.co/Featherlabs). These models are optimized for efficient local execution on consumer hardware using CPU or GPU acceleration. They are fully compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com), [LM Studio](https://lmstudio.ai), [Jan](https://jan.ai), and other GGUF-based runtimes. --- ## 📦 Available Quantizations Choose the file that best matches your system's VRAM/RAM capacity: | Filename | Size | VRAM Req | Quality | Best For | |:---------|:----:|:-------:|:-------:|:---------| | `aura-7b-f16.gguf` | ~15.2 GB | ~16 GB | ⭐⭐⭐⭐⭐ | Maximum quality, high VRAM systems | | `aura-7b-q8_0.gguf` | ~8.1 GB | ~10 GB | ⭐⭐⭐⭐⭐ | Near-lossless quality | | `aura-7b-q6_k.gguf` | ~6.25 GB | ~8 GB | ⭐⭐⭐⭐ | Excellent quality, sweet spot for 8GB GPUs | | `aura-7b-q4_k_m.gguf` | ~4.68 GB | ~6 GB | ⭐⭐⭐⭐ | 🏆 **Recommended for most users** (MacBook Air, RTX 3060/4060) | | `aura-7b-q2_k.gguf` | ~3.02 GB | ~4 GB | ⭐⭐⭐ | Minimum RAM / CPU-only execution | > 💡 **Tip:** If you have an 8GB GPU, `Q6_K` will fit perfectly while offloading all layers. If you have 6GB or less, use `Q4_K_M`. --- ## 🚀 Quick Start / Usage ### 🦙 llama.cpp The basic command for interactive terminal chat: ```bash ./llama-cli \ -m aura-7b-q4_k_m.gguf \ -p "You are Aura, a helpful agentic AI assistant created by Featherlabs." \ --ctx-size 8192 \ -b 512 \ -n -1 \ -i --color ``` *(Add `-ngl 99` to offload all layers to your GPU if supported)* ### 🦙 Ollama Creating a custom Ollama model is the easiest way to serve the API locally: 1. Create a file named `Modelfile` in the same directory as the GGUF: ```dockerfile FROM ./aura-7b-q4_k_m.gguf # Set the system prompt SYSTEM "You are Aura, a helpful agentic AI assistant created by Featherlabs." # Set standard parameters PARAMETER num_ctx 8192 PARAMETER temperature 0.7 PARAMETER top_p 0.9 # The chat template is usually auto-detected for Qwen2, but you can explicitly set it if needed TEMPLATE """{{ if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant {{ .Response }}<|im_end|> """ ``` 2. Build and run: ```bash ollama create aura-7b -f Modelfile ollama run aura-7b ``` ### 🖥️ LM Studio 1. Open LM Studio and search for `Featherlabs/Aura-7b-GGUF` (or drag and drop the `.gguf` file). 2. Download your preferred quantization (e.g., `Q4_K_M`). 3. Go to the Chat tab and load the model. 4. From the right panel, select the **Qwen2** chat template (or set the system prompt manually). 5. Start chatting! --- ## 📊 Model Details | Property | Value | |---|---| | **Base Model** | [Featherlabs/Aura-7b](https://huggingface.co/Featherlabs/Aura-7b) | | **Architecture** | Qwen2 | | **Parameters** | ~8B | | **Context length** | 8192 tokens | | **Quantization tool** | `llama.cpp` | | **Format** | GGUF (v3) | --- ## 👑 Original Model (Safetensors) If you need the full-precision `BF16` weights for fine-tuning, training, or deployment in production clusters (vLLM, TGI, SGLang): 👉 **[Featherlabs/Aura-7b](https://huggingface.co/Featherlabs/Aura-7b)** --- ## 📜 License Apache 2.0 — consistent with [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct). ---
**Built with ❤️ by [Featherlabs](https://huggingface.co/Featherlabs)** *Operated by Owlkun*