初始化项目，由ModelHub XC社区提供模型

Model: vanta-research/wraith-8b Source: Original Platform
2026-05-14 02:01:27 +08:00
commit 8289a9988e
17 changed files with 3265 additions and 0 deletions
--- a/GGUF_README.md
+++ b/GGUF_README.md
@@ -0,0 +1,54 @@
+# GGUF Quantized Models
+
+For optimal inference performance, we provide GGUF quantized versions of Wraith-8B.
+
+## Available Models
+
+### Recommended: Q4_K_M (4.7GB)
+- **Best balance** of quality and speed
+- **File:** `wraith-8b-Q4_K_M.gguf`
+- **Size:** 4.7GB
+- **Performance:** ~3.6s per response
+- **Quality:** No degradation vs FP16 on benchmarks
+
+### Full Precision: FP16 (16GB)
+- **Highest quality** (though Q4_K_M shows no loss)
+- **File:** `wraith-8b-fp16.gguf`
+- **Size:** 16GB
+- **Performance:** ~50s per response (CPU offloading)
+- **Use case:** Research/analysis only
+
+## Download
+
+Due to file size, GGUF models are stored separately:
+
+```bash
+# Download Q4_K_M (recommended)
+wget https://huggingface.co/NeuroForge/Wraith-8B/resolve/main/gguf/wraith-8b-Q4_K_M.gguf
+
+# Or use huggingface-cli
+huggingface-cli download NeuroForge/Wraith-8B gguf/wraith-8b-Q4_K_M.gguf
+```
+
+## Usage with llama.cpp
+
+```bash
+./llama-cli -m wraith-8b-Q4_K_M.gguf \
+  -p "Calculate the area of a circle with radius 5cm." \
+  -n 512 \
+  --temp 0.7 \
+  --top-p 0.9
+```
+
+## Usage with Ollama
+
+See main README for Modelfile template and setup instructions.
+
+## Benchmarks
+
+All benchmark results in the main model card were achieved using the Q4_K_M quantization:
+- GSM8K: 70%
+- MMLU: 66.4%
+- TruthfulQA: 58.5%
+
+**Conclusion:** Q4_K_M provides full model quality at 29% of the size.