Llama-3.2-1B-Instruct-bnb-4…/README.md

---
base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
library_name: transformers
pipeline_tag: text-generation
tags:
- gguf
- fine-tuned
- lima
language:
- en
license: apache-2.0
---

# Llama-3.2-1B-Instruct-bnb-4bit-lima - GGUF Format

GGUF format quantizations for llama.cpp/Ollama.

## Model Details

- **Base Model**: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)
- **Format**: gguf
- **Dataset**: [GAIR/lima](https://huggingface.co/datasets/GAIR/lima)
- **Size**: 0.75 GB - 2.31 GB
- **Usage**: llama.cpp / Ollama

## Related Models

- **LoRA Adapters**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora) - Smaller LoRA-only adapters
- **Merged FP16 Model**: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima) - Original unquantized model in FP16


## Prompt Format

This model uses the **Llama 3.2** chat template.

### Ollama Template Format

```
{{ if .Messages }}
{{- if or .System .Tools }}<|start_header_id|>system<|end_header_id|>
{{- if .System }}

{{ .System }}
{{- end }}
{{- if .Tools }}

You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the original use question.
{{- end }}
{{- end }}<|eot_id|>
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 }}
{{- if eq .Role "user" }}<|start_header_id|>user<|end_header_id|>
{{- if and $.Tools $last }}

Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.

Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.

{{ $.Tools }}
{{- end }}

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- else if eq .Role "assistant" }}<|start_header_id|>assistant<|end_header_id|>
{{- if .ToolCalls }}

{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}
{{- else }}

{{ .Content }}{{ if not $last }}<|eot_id|>{{ end }}
{{- end }}
{{- else if eq .Role "tool" }}<|start_header_id|>ipython<|end_header_id|>

{{ .Content }}<|eot_id|>{{ if $last }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}
{{- end }}
{{- end }}
{{- else }}
{{- if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ end }}{{ .Response }}{{ if .Response }}<|eot_id|>{{ end }}
```


## Training Details

- **LoRA Rank**: 64
- **Training Steps**: 480
- **Training Loss**: 1.1123
- **Max Seq Length**: 2048
- **Training Scope**: 1,278 samples (3.0 epoch(s), full dataset)

For complete training configuration, see the LoRA adapters repository/directory.

## Available Quantizations

| Quantization | File | Size | Quality |
|--------------|------|------|---------|
| **F16** | [Llama-3.2-1B-Instruct-bnb-4bit-lima-F16.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-F16.gguf) | 2.31 GB | Full precision (largest) |
| **Q4_K_M** | [Llama-3.2-1B-Instruct-bnb-4bit-lima-Q4_K_M.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-Q4_K_M.gguf) | 0.75 GB | Good balance (recommended) |
| **Q6_K** | [Llama-3.2-1B-Instruct-bnb-4bit-lima-Q6_K.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-Q6_K.gguf) | 0.95 GB | High quality |
| **Q8_0** | [Llama-3.2-1B-Instruct-bnb-4bit-lima-Q8_0.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-Q8_0.gguf) | 1.23 GB | Very high quality, near original |

**Usage:** Use the dropdown menu above to select a quantization, then follow HuggingFace's provided instructions.

## License

Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on GAIR/lima.
Please refer to the original model and dataset licenses.

## Credits

**Trained by:** Farhan Syah

**Training pipeline:**
- [unsloth-finetuning](https://github.com/farhan-syah/unsloth-finetuning) by [@farhan-syah](https://github.com/farhan-syah)
- [Unsloth](https://github.com/unslothai/unsloth) - 2x faster LLM fine-tuning

**Base components:**
- Base model: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)
- Training dataset: [GAIR/lima](https://huggingface.co/datasets/GAIR/lima) by GAIR
初始化项目，由ModelHub XC社区提供模型 Model: fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-GGUF Source: Original Platform 2026-04-21 18:24:45 +08:00			`---`
			`base_model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit`
			`library_name: transformers`
			`pipeline_tag: text-generation`
			`tags:`
			`- gguf`
			`- fine-tuned`
			`- lima`
			`language:`
			`- en`
			`license: apache-2.0`
			`---`

			`# Llama-3.2-1B-Instruct-bnb-4bit-lima - GGUF Format`

			`GGUF format quantizations for llama.cpp/Ollama.`

			`## Model Details`

			`- Base Model: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)`
			`- Format: gguf`
			`- Dataset: [GAIR/lima](https://huggingface.co/datasets/GAIR/lima)`
			`- Size: 0.75 GB - 2.31 GB`
			`- Usage: llama.cpp / Ollama`

			`## Related Models`

			`- LoRA Adapters: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora) - Smaller LoRA-only adapters`
			`- Merged FP16 Model: [fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima](https://huggingface.co/fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima) - Original unquantized model in FP16`


			`## Prompt Format`

			`This model uses the Llama 3.2 chat template.`

			`### Ollama Template Format`

			```
			`{{ if .Messages }}`
			`{{- if or .System .Tools }}<\|start_header_id\|>system<\|end_header_id\|>`
			`{{- if .System }}`

			`{{ .System }}`
			`{{- end }}`
			`{{- if .Tools }}`

			`You are a helpful assistant with tool calling capabilities. When you receive a tool call response, use the output to format an answer to the original use question.`
			`{{- end }}`
			`{{- end }}<\|eot_id\|>`
			`{{- range $i, $_ := .Messages }}`
			`{{- $last := eq (len (slice $.Messages $i)) 1 }}`
			`{{- if eq .Role "user" }}<\|start_header_id\|>user<\|end_header_id\|>`
			`{{- if and $.Tools $last }}`

			`Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.`

			`Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables.`

			`{{ $.Tools }}`
			`{{- end }}`

			`{{ .Content }}<\|eot_id\|>{{ if $last }}<\|start_header_id\|>assistant<\|end_header_id\|>`

			`{{ end }}`
			`{{- else if eq .Role "assistant" }}<\|start_header_id\|>assistant<\|end_header_id\|>`
			`{{- if .ToolCalls }}`

			`{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "parameters": {{ .Function.Arguments }}}{{ end }}`
			`{{- else }}`

			`{{ .Content }}{{ if not $last }}<\|eot_id\|>{{ end }}`
			`{{- end }}`
			`{{- else if eq .Role "tool" }}<\|start_header_id\|>ipython<\|end_header_id\|>`

			`{{ .Content }}<\|eot_id\|>{{ if $last }}<\|start_header_id\|>assistant<\|end_header_id\|>`

			`{{ end }}`
			`{{- end }}`
			`{{- end }}`
			`{{- else }}`
			`{{- if .System }}<\|start_header_id\|>system<\|end_header_id\|>`

			`{{ .System }}<\|eot_id\|>{{ end }}{{ if .Prompt }}<\|start_header_id\|>user<\|end_header_id\|>`

			`{{ .Prompt }}<\|eot_id\|>{{ end }}<\|start_header_id\|>assistant<\|end_header_id\|>`

			`{{ end }}{{ .Response }}{{ if .Response }}<\|eot_id\|>{{ end }}`
			```


			`## Training Details`

			`- LoRA Rank: 64`
			`- Training Steps: 480`
			`- Training Loss: 1.1123`
			`- Max Seq Length: 2048`
			`- Training Scope: 1,278 samples (3.0 epoch(s), full dataset)`

			`For complete training configuration, see the LoRA adapters repository/directory.`

			`## Available Quantizations`

			`\| Quantization \| File \| Size \| Quality \|`
			`\|--------------\|------\|------\|---------\|`
			`\| F16 \| [Llama-3.2-1B-Instruct-bnb-4bit-lima-F16.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-F16.gguf) \| 2.31 GB \| Full precision (largest) \|`
			`\| Q4_K_M \| [Llama-3.2-1B-Instruct-bnb-4bit-lima-Q4_K_M.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-Q4_K_M.gguf) \| 0.75 GB \| Good balance (recommended) \|`
			`\| Q6_K \| [Llama-3.2-1B-Instruct-bnb-4bit-lima-Q6_K.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-Q6_K.gguf) \| 0.95 GB \| High quality \|`
			`\| Q8_0 \| [Llama-3.2-1B-Instruct-bnb-4bit-lima-Q8_0.gguf](Llama-3.2-1B-Instruct-bnb-4bit-lima-Q8_0.gguf) \| 1.23 GB \| Very high quality, near original \|`

			`Usage: Use the dropdown menu above to select a quantization, then follow HuggingFace's provided instructions.`

			`## License`

			`Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on GAIR/lima.`
			`Please refer to the original model and dataset licenses.`

			`## Credits`

			`Trained by: Farhan Syah`

			`Training pipeline:`
			`- [unsloth-finetuning](https://github.com/farhan-syah/unsloth-finetuning) by [@farhan-syah](https://github.com/farhan-syah)`
			`- [Unsloth](https://github.com/unslothai/unsloth) - 2x faster LLM fine-tuning`

			`Base components:`
			`- Base model: [unsloth/Llama-3.2-1B-Instruct-bnb-4bit](https://huggingface.co/unsloth/Llama-3.2-1B-Instruct-bnb-4bit)`
			`- Training dataset: [GAIR/lima](https://huggingface.co/datasets/GAIR/lima) by GAIR`