language, license, base_model, tags, pipeline_tag
language license base_model tags pipeline_tag
en
apache-2.0 Featherlabs/Aura-7b
gguf
qwen2
agentic
function-calling
tool-use
conversational
featherlabs
llama-cpp
ollama
lm-studio
text-generation

Aura-7b GGUF

A small model that punches above its weight — Now optimized for local inference

Agentic · Tool Use · Function Calling · Reasoning

License Base Model Quantization

Built by Featherlabs · Operated by Owlkun


Overview

This repository contains GGUF quantized versions of Featherlabs/Aura-7b — an agentic 7B language model fine-tuned on Qwen2.5-7B-Instruct by Featherlabs.

These models are optimized for efficient local execution on consumer hardware using CPU or GPU acceleration. They are fully compatible with llama.cpp, Ollama, LM Studio, Jan, and other GGUF-based runtimes.


📦 Available Quantizations

Choose the file that best matches your system's VRAM/RAM capacity:

Filename Size VRAM Req Quality Best For
aura-7b-f16.gguf ~15.2 GB ~16 GB Maximum quality, high VRAM systems
aura-7b-q8_0.gguf ~8.1 GB ~10 GB Near-lossless quality
aura-7b-q6_k.gguf ~6.25 GB ~8 GB Excellent quality, sweet spot for 8GB GPUs
aura-7b-q4_k_m.gguf ~4.68 GB ~6 GB 🏆 Recommended for most users (MacBook Air, RTX 3060/4060)
aura-7b-q2_k.gguf ~3.02 GB ~4 GB Minimum RAM / CPU-only execution

💡 Tip: If you have an 8GB GPU, Q6_K will fit perfectly while offloading all layers. If you have 6GB or less, use Q4_K_M.


🚀 Quick Start / Usage

🦙 llama.cpp

The basic command for interactive terminal chat:

./llama-cli \
  -m aura-7b-q4_k_m.gguf \
  -p "You are Aura, a helpful agentic AI assistant created by Featherlabs." \
  --ctx-size 8192 \
  -b 512 \
  -n -1 \
  -i --color

(Add -ngl 99 to offload all layers to your GPU if supported)

🦙 Ollama

Creating a custom Ollama model is the easiest way to serve the API locally:

  1. Create a file named Modelfile in the same directory as the GGUF:
FROM ./aura-7b-q4_k_m.gguf

# Set the system prompt
SYSTEM "You are Aura, a helpful agentic AI assistant created by Featherlabs."

# Set standard parameters
PARAMETER num_ctx 8192
PARAMETER temperature 0.7
PARAMETER top_p 0.9

# The chat template is usually auto-detected for Qwen2, but you can explicitly set it if needed
TEMPLATE """{{ if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}{{ if .Prompt }}<|im_start|>user
{{ .Prompt }}<|im_end|>
{{ end }}<|im_start|>assistant
{{ .Response }}<|im_end|>
"""
  1. Build and run:
ollama create aura-7b -f Modelfile
ollama run aura-7b

🖥️ LM Studio

  1. Open LM Studio and search for Featherlabs/Aura-7b-GGUF (or drag and drop the .gguf file).
  2. Download your preferred quantization (e.g., Q4_K_M).
  3. Go to the Chat tab and load the model.
  4. From the right panel, select the Qwen2 chat template (or set the system prompt manually).
  5. Start chatting!

📊 Model Details

Property Value
Base Model Featherlabs/Aura-7b
Architecture Qwen2
Parameters ~8B
Context length 8192 tokens
Quantization tool llama.cpp
Format GGUF (v3)

👑 Original Model (Safetensors)

If you need the full-precision BF16 weights for fine-tuning, training, or deployment in production clusters (vLLM, TGI, SGLang):

👉 Featherlabs/Aura-7b


📜 License

Apache 2.0 — consistent with Qwen2.5-7B-Instruct.


Built with ❤️ by Featherlabs

Operated by Owlkun

Description
Model synced from source: Featherlabs/Aura-7b-GGUF
Readme 26 KiB