--- license: apache-2.0 language: - en pipeline_tag: text-generation tags: - granite - gguf - llama-cpp - reasoning - quantized - local-llm base_model: Avtrkrb/granite-claude-h-350m library_name: gguf --- # granite-claude-h-350m-GGUF GGUF quantizations of: `Avtrkrb/granite-claude-h-350m` These files are intended for inference using: - llama.cpp - LM Studio - Open WebUI - Jan - KoboldCpp - GPT4All - Ollama (after conversion/import) --- ## Available Quantizations Typical variants included: | Quant | Use Case | |---------|---------| | Q4_K_M | Best size / quality balance | | Q5_K_M | Higher quality | | Q6_K | Near-lossless for most use cases | | Q8_0 | Highest quality quantized version | --- ## Source Model Merged model: https://huggingface.co/Avtrkrb/granite-claude-h-350m Dataset: https://huggingface.co/datasets/Avtrkrb/combined-reasoning-claude --- ## Example llama.cpp Usage ```bash ./llama-cli \ -m granite-claude-h-350m-Q4_K_M.gguf \ -p "Explain quantum tunneling." ``` --- ## Recommended Quant For most users: **Q4_K_M** offers the best balance between: - quality - speed - memory usage --- ## License This repository follows the licensing terms of the original Granite model.