初始化项目,由ModelHub XC社区提供模型
Model: vanta-research/wraith-8b Source: Original Platform
This commit is contained in:
54
GGUF_README.md
Normal file
54
GGUF_README.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# GGUF Quantized Models
|
||||
|
||||
For optimal inference performance, we provide GGUF quantized versions of Wraith-8B.
|
||||
|
||||
## Available Models
|
||||
|
||||
### Recommended: Q4_K_M (4.7GB)
|
||||
- **Best balance** of quality and speed
|
||||
- **File:** `wraith-8b-Q4_K_M.gguf`
|
||||
- **Size:** 4.7GB
|
||||
- **Performance:** ~3.6s per response
|
||||
- **Quality:** No degradation vs FP16 on benchmarks
|
||||
|
||||
### Full Precision: FP16 (16GB)
|
||||
- **Highest quality** (though Q4_K_M shows no loss)
|
||||
- **File:** `wraith-8b-fp16.gguf`
|
||||
- **Size:** 16GB
|
||||
- **Performance:** ~50s per response (CPU offloading)
|
||||
- **Use case:** Research/analysis only
|
||||
|
||||
## Download
|
||||
|
||||
Due to file size, GGUF models are stored separately:
|
||||
|
||||
```bash
|
||||
# Download Q4_K_M (recommended)
|
||||
wget https://huggingface.co/NeuroForge/Wraith-8B/resolve/main/gguf/wraith-8b-Q4_K_M.gguf
|
||||
|
||||
# Or use huggingface-cli
|
||||
huggingface-cli download NeuroForge/Wraith-8B gguf/wraith-8b-Q4_K_M.gguf
|
||||
```
|
||||
|
||||
## Usage with llama.cpp
|
||||
|
||||
```bash
|
||||
./llama-cli -m wraith-8b-Q4_K_M.gguf \
|
||||
-p "Calculate the area of a circle with radius 5cm." \
|
||||
-n 512 \
|
||||
--temp 0.7 \
|
||||
--top-p 0.9
|
||||
```
|
||||
|
||||
## Usage with Ollama
|
||||
|
||||
See main README for Modelfile template and setup instructions.
|
||||
|
||||
## Benchmarks
|
||||
|
||||
All benchmark results in the main model card were achieved using the Q4_K_M quantization:
|
||||
- GSM8K: 70%
|
||||
- MMLU: 66.4%
|
||||
- TruthfulQA: 58.5%
|
||||
|
||||
**Conclusion:** Q4_K_M provides full model quality at 29% of the size.
|
||||
Reference in New Issue
Block a user