初始化项目，由ModelHub XC社区提供模型

Model: Featherlabs/Aura-7b-GGUF Source: Original Platform
2026-04-11 12:44:02 +08:00
commit 6576e0bed7
7 changed files with 209 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,154 @@
+---
+language:
+- en
+license: apache-2.0
+base_model: Featherlabs/Aura-7b
+tags:
+- gguf
+- qwen2
+- agentic
+- function-calling
+- tool-use
+- conversational
+- featherlabs
+- llama-cpp
+- ollama
+- lm-studio
+pipeline_tag: text-generation
+---
+
+<div align="center">
+
+# ⚡ Aura-7b GGUF
+
+### *A small model that punches above its weight — Now optimized for local inference*
+
+**Agentic · Tool Use · Function Calling · Reasoning**
+
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Base Model](https://img.shields.io/badge/Base-Featherlabs/Aura--7b-purple)](https://huggingface.co/Featherlabs/Aura-7b)
+[![Quantization](https://img.shields.io/badge/Format-GGUF-orange)](#)
+
+*Built by [Featherlabs](https://huggingface.co/Featherlabs) · Operated by Owlkun*
+
+</div>
+
+---
+
+## ✨ Overview
+
+This repository contains **GGUF quantized versions** of **[Featherlabs/Aura-7b](https://huggingface.co/Featherlabs/Aura-7b)** — an agentic 7B language model fine-tuned on Qwen2.5-7B-Instruct by [Featherlabs](https://huggingface.co/Featherlabs).
+
+These models are optimized for efficient local execution on consumer hardware using CPU or GPU acceleration. They are fully compatible with [llama.cpp](https://github.com/ggerganov/llama.cpp), [Ollama](https://ollama.com), [LM Studio](https://lmstudio.ai), [Jan](https://jan.ai), and other GGUF-based runtimes.
+
+---
+
+## 📦 Available Quantizations
+
+Choose the file that best matches your system's VRAM/RAM capacity:
+
+| Filename | Size | VRAM Req | Quality | Best For |
+|:---------|:----:|:-------:|:-------:|:---------|
+| `aura-7b-f16.gguf` | ~15.2 GB | ~16 GB | ⭐⭐⭐⭐⭐ | Maximum quality, high VRAM systems |
+| `aura-7b-q8_0.gguf` | ~8.1 GB | ~10 GB | ⭐⭐⭐⭐⭐ | Near-lossless quality |
+| `aura-7b-q6_k.gguf` | ~6.25 GB | ~8 GB | ⭐⭐⭐⭐ | Excellent quality, sweet spot for 8GB GPUs |
+| `aura-7b-q4_k_m.gguf` | ~4.68 GB | ~6 GB | ⭐⭐⭐⭐ | 🏆 **Recommended for most users** (MacBook Air, RTX 3060/4060) |
+| `aura-7b-q2_k.gguf` | ~3.02 GB | ~4 GB | ⭐⭐⭐ | Minimum RAM / CPU-only execution |
+
+> 💡 **Tip:** If you have an 8GB GPU, `Q6_K` will fit perfectly while offloading all layers. If you have 6GB or less, use `Q4_K_M`.
+
+---
+
+## 🚀 Quick Start / Usage
+
+### 🦙 llama.cpp
+
+The basic command for interactive terminal chat:
+
+```bash
+./llama-cli \
+  -m aura-7b-q4_k_m.gguf \
+  -p "You are Aura, a helpful agentic AI assistant created by Featherlabs." \
+  --ctx-size 8192 \
+  -b 512 \
+  -n -1 \
+  -i --color
+```
+*(Add `-ngl 99` to offload all layers to your GPU if supported)*
+
+### 🦙 Ollama
+
+Creating a custom Ollama model is the easiest way to serve the API locally:
+
+1. Create a file named `Modelfile` in the same directory as the GGUF:
+```dockerfile
+FROM ./aura-7b-q4_k_m.gguf
+
+# Set the system prompt
+SYSTEM "You are Aura, a helpful agentic AI assistant created by Featherlabs."
+
+# Set standard parameters
+PARAMETER num_ctx 8192
+PARAMETER temperature 0.7
+PARAMETER top_p 0.9
+
+# The chat template is usually auto-detected for Qwen2, but you can explicitly set it if needed
+TEMPLATE """{{ if .System }}<|im_start|>system
+{{ .System }}<|im_end|>
+{{ end }}{{ if .Prompt }}<|im_start|>user
+{{ .Prompt }}<|im_end|>
+{{ end }}<|im_start|>assistant
+{{ .Response }}<|im_end|>
+"""
+```
+
+2. Build and run:
+```bash
+ollama create aura-7b -f Modelfile
+ollama run aura-7b
+```
+
+### 🖥️ LM Studio
+
+1. Open LM Studio and search for `Featherlabs/Aura-7b-GGUF` (or drag and drop the `.gguf` file).
+2. Download your preferred quantization (e.g., `Q4_K_M`).
+3. Go to the Chat tab and load the model.
+4. From the right panel, select the **Qwen2** chat template (or set the system prompt manually).
+5. Start chatting!
+
+---
+
+## 📊 Model Details
+
+| Property | Value |
+|---|---|
+| **Base Model** | [Featherlabs/Aura-7b](https://huggingface.co/Featherlabs/Aura-7b) |
+| **Architecture** | Qwen2 |
+| **Parameters** | ~8B |
+| **Context length** | 8192 tokens |
+| **Quantization tool** | `llama.cpp` |
+| **Format** | GGUF (v3) |
+
+---
+
+## 👑 Original Model (Safetensors)
+
+If you need the full-precision `BF16` weights for fine-tuning, training, or deployment in production clusters (vLLM, TGI, SGLang):
+
+👉 **[Featherlabs/Aura-7b](https://huggingface.co/Featherlabs/Aura-7b)**
+
+---
+
+## 📜 License
+
+Apache 2.0 — consistent with [Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct).
+
+---
+
+<div align="center">
+
+**Built with ❤️ by [Featherlabs](https://huggingface.co/Featherlabs)**
+
+*Operated by Owlkun*
+
+</div>