88 lines
1.2 KiB
Markdown
88 lines
1.2 KiB
Markdown
---
|
|
license: apache-2.0
|
|
language:
|
|
- en
|
|
pipeline_tag: text-generation
|
|
tags:
|
|
- granite
|
|
- gguf
|
|
- llama-cpp
|
|
- reasoning
|
|
- quantized
|
|
- local-llm
|
|
|
|
base_model: Avtrkrb/granite-claude-h-350m
|
|
|
|
library_name: gguf
|
|
---
|
|
|
|
# granite-claude-h-350m-GGUF
|
|
|
|
GGUF quantizations of:
|
|
|
|
`Avtrkrb/granite-claude-h-350m`
|
|
|
|
These files are intended for inference using:
|
|
|
|
- llama.cpp
|
|
- LM Studio
|
|
- Open WebUI
|
|
- Jan
|
|
- KoboldCpp
|
|
- GPT4All
|
|
- Ollama (after conversion/import)
|
|
|
|
---
|
|
|
|
## Available Quantizations
|
|
|
|
Typical variants included:
|
|
|
|
| Quant | Use Case |
|
|
|---------|---------|
|
|
| Q4_K_M | Best size / quality balance |
|
|
| Q5_K_M | Higher quality |
|
|
| Q6_K | Near-lossless for most use cases |
|
|
| Q8_0 | Highest quality quantized version |
|
|
|
|
---
|
|
|
|
## Source Model
|
|
|
|
Merged model:
|
|
|
|
https://huggingface.co/Avtrkrb/granite-claude-h-350m
|
|
|
|
Dataset:
|
|
|
|
https://huggingface.co/datasets/Avtrkrb/combined-reasoning-claude
|
|
|
|
---
|
|
|
|
## Example llama.cpp Usage
|
|
|
|
```bash
|
|
./llama-cli \
|
|
-m granite-claude-h-350m-Q4_K_M.gguf \
|
|
-p "Explain quantum tunneling."
|
|
```
|
|
|
|
---
|
|
|
|
## Recommended Quant
|
|
|
|
For most users:
|
|
|
|
**Q4_K_M**
|
|
|
|
offers the best balance between:
|
|
|
|
- quality
|
|
- speed
|
|
- memory usage
|
|
|
|
---
|
|
|
|
## License
|
|
|
|
This repository follows the licensing terms of the original Granite model. |