license, library_name, tags, pipeline_tag, language, base_model_relation, base_model
license library_name tags pipeline_tag language base_model_relation base_model
other gguf
gguf
llama
llama-3.2
text-generation
local-llm
llama-cpp
lm-studio
ollama
1b
text-generation
en
quantized
gss1147/Llama-3.2-OctoThinker-iNano-1B

Llama-3.2-OctoThinker-iNano-1B-GGUF

Model Summary

Llama-3.2-OctoThinker-iNano-1B-GGUF is the GGUF quantized release of the main model:

Main model repo:
https://huggingface.co/gss1147/Llama-3.2-OctoThinker-iNano-1B

This repository packages the model for efficient local inference in GGUF-compatible runtimes such as llama.cpp, LM Studio, and similar local tools.

This GGUF repository corresponds to the main model repo:

gss1147/Llama-3.2-OctoThinker-iNano-1B

If you want the original non-GGUF model, training/merge details, tokenizer files, and main repository metadata, use the repo above.

Available Files

This GGUF repository currently includes:

  • Q4_K_M — 955 MB
  • Q5_K_M — 1.09 GB
  • F16 — 3 GB

Architecture

  • Architecture: llama
  • Model size: 1B params

Intended Use

This model is intended for:

  • local text generation
  • assistant-style prompting
  • lightweight reasoning tasks
  • summarization
  • simple coding help
  • offline/local inference workflows

Quantization Notes

Choose the file that best matches your hardware:

  • Q4_K_M for smaller size and lighter RAM usage
  • Q5_K_M for a stronger quality-to-size balance
  • F16 for the highest-fidelity file in this repo, with much higher memory requirements

Example llama.cpp Usage

llama-cli -m /path/to/Llama-3.2-OctoThinker-iNano-1B.Q4_K_M.gguf -p "Explain recursion in Python with a simple example."
Description
Model synced from source: gss1147/Llama-3.2-OctoThinker-iNano-1B-GGUF
Readme 25 KiB