--- license: other library_name: gguf tags: - gguf - llama - llama-3.2 - text-generation - local-llm - llama-cpp - lm-studio - ollama - 1b pipeline_tag: text-generation language: - en base_model_relation: quantized base_model: - gss1147/Llama-3.2-OctoThinker-iNano-1B --- # Llama-3.2-OctoThinker-iNano-1B-GGUF ## Model Summary **Llama-3.2-OctoThinker-iNano-1B-GGUF** is the GGUF quantized release of the main model: **Main model repo:** https://huggingface.co/gss1147/Llama-3.2-OctoThinker-iNano-1B This repository packages the model for efficient **local inference** in GGUF-compatible runtimes such as **llama.cpp**, **LM Studio**, and similar local tools. ## GGUF to Main Model Link This GGUF repository corresponds to the main model repo: **[`gss1147/Llama-3.2-OctoThinker-iNano-1B`](https://huggingface.co/gss1147/Llama-3.2-OctoThinker-iNano-1B)** If you want the original non-GGUF model, training/merge details, tokenizer files, and main repository metadata, use the repo above. ## Available Files This GGUF repository currently includes: - **Q4_K_M** — 955 MB - **Q5_K_M** — 1.09 GB - **F16** — 3 GB ## Architecture - **Architecture:** llama - **Model size:** 1B params ## Intended Use This model is intended for: - local text generation - assistant-style prompting - lightweight reasoning tasks - summarization - simple coding help - offline/local inference workflows ## Quantization Notes Choose the file that best matches your hardware: - **Q4_K_M** for smaller size and lighter RAM usage - **Q5_K_M** for a stronger quality-to-size balance - **F16** for the highest-fidelity file in this repo, with much higher memory requirements ## Example llama.cpp Usage ```bash llama-cli -m /path/to/Llama-3.2-OctoThinker-iNano-1B.Q4_K_M.gguf -p "Explain recursion in Python with a simple example."