Model: gss1147/Llama-3.2-OctoThinker-iNano-1B-GGUF Source: Original Platform
license, library_name, tags, pipeline_tag, language, base_model_relation, base_model
| license | library_name | tags | pipeline_tag | language | base_model_relation | base_model | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| other | gguf |
|
text-generation |
|
quantized |
|
Llama-3.2-OctoThinker-iNano-1B-GGUF
Model Summary
Llama-3.2-OctoThinker-iNano-1B-GGUF is the GGUF quantized release of the main model:
Main model repo:
https://huggingface.co/gss1147/Llama-3.2-OctoThinker-iNano-1B
This repository packages the model for efficient local inference in GGUF-compatible runtimes such as llama.cpp, LM Studio, and similar local tools.
GGUF to Main Model Link
This GGUF repository corresponds to the main model repo:
gss1147/Llama-3.2-OctoThinker-iNano-1B
If you want the original non-GGUF model, training/merge details, tokenizer files, and main repository metadata, use the repo above.
Available Files
This GGUF repository currently includes:
- Q4_K_M — 955 MB
- Q5_K_M — 1.09 GB
- F16 — 3 GB
Architecture
- Architecture: llama
- Model size: 1B params
Intended Use
This model is intended for:
- local text generation
- assistant-style prompting
- lightweight reasoning tasks
- summarization
- simple coding help
- offline/local inference workflows
Quantization Notes
Choose the file that best matches your hardware:
- Q4_K_M for smaller size and lighter RAM usage
- Q5_K_M for a stronger quality-to-size balance
- F16 for the highest-fidelity file in this repo, with much higher memory requirements
Example llama.cpp Usage
llama-cli -m /path/to/Llama-3.2-OctoThinker-iNano-1B.Q4_K_M.gguf -p "Explain recursion in Python with a simple example."
Description