Llama_3_2_3B_SFT_GGUF/README.md

---
license: mit
datasets:
- mlabonne/FineTome-100k
language:
- en
base_model:
- unsloth/Llama-3.2-3B-Instruct
pipeline_tag: question-answering
---
# Llama-3.2-3B-Instruct Fine-Tuning on Custom Dataset

## Overview

This repository demonstrates the process of fine-tuning the **Llama-3.2-3B-Instruct** model using the **Unsloth** library. The model is trained on a custom dataset, **FineTome-100k**, for **60 steps**. Key optimizations include:

- **4-bit quantization** to reduce memory usage
- **LoRA (Low-Rank Adaptation)** for efficient fine-tuning
- Techniques for improving inference speed and generating text with the model

## Model Details

- **Model Name**: Llama-3.2-3B-Instruct
- **Pretrained Weights**: Unsloth’s pretrained version for Llama-3.2-3B
- **Quantization**: 4-bit quantization (set via `load_in_4bit=True`) for reduced memory usage

### LoRA Configuration:
- **Rank**: 16
- **Target Modules**: 
  - q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **LoRA Alpha**: 16
- **LoRA Dropout**: 0

### Gradient Checkpointing:
- **Use Gradient Checkpointing**: "unsloth" for improved long-context training

## Training

- **Dataset**: FineTome-100k (first 500 records selected)
- **Loss Function**: Standard loss for sequence-to-sequence tasks
- **Training Steps**: 60 steps with batch size of 2 (gradient accumulation steps set to 4)
- **Optimizer**: AdamW 8-bit

### Training Parameters:
- **Max Sequence Length**: 2048 tokens
- **Learning Rate**: 2e-4
- **Gradient Accumulation Steps**: 4
- **Total Steps**: 60
- **Epochs**: 1 (as `max_steps` was set to 60)
- **Training Precision**: Use FP16 or BF16 for training depending on GPU support

## Performance

- **GPU Used**: Tesla T4 (14.7 GB max memory)

### Peak Memory Usage:
- **Total Reserved Memory**: 3.855 GB
- **Memory Used for LoRA**: 1.312 GB
- **Memory Utilization**: 26.1% (peak) of available memory

## Conclusion

This notebook showcases an efficient approach to fine-tuning large language models with memory optimizations and improved training efficiency using **LoRA** and **4-bit quantization**. The **Unsloth** library allows for fast training and inference, making this setup ideal for large-scale tasks even with limited GPU resources.

## Notebook

Access the implementation notebook for this model [here](https://github.com/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/Llama_3_2_3B_SFT_GGUF.ipynb). This notebook provides detailed steps for fine-tuning and deploying the model.
-												初始化项目，由ModelHub XC社区提供模型

Model: SURESHBEEKHANI/Llama_3_2_3B_SFT_GGUF
Source: Original Platform

											
										
										
											2026-04-20 13:53:07 +08:00
+								---
 								license: mit
 								datasets:
 								- mlabonne/FineTome-100k
 								language:
 								- en
 								base_model:
 								- unsloth/Llama-3.2-3B-Instruct
 								pipeline_tag: question-answering
 								---
 								# Llama-3.2-3B-Instruct Fine-Tuning on Custom Dataset
 								## Overview
 								This repository demonstrates the process of fine-tuning the **Llama-3.2-3B-Instruct** model using the **Unsloth** library. The model is trained on a custom dataset, **FineTome-100k**, for **60 steps**. Key optimizations include:
 								- **4-bit quantization** to reduce memory usage
 								- **LoRA (Low-Rank Adaptation)** for efficient fine-tuning
 								- Techniques for improving inference speed and generating text with the model
 								## Model Details
 								- **Model Name**: Llama-3.2-3B-Instruct
 								- **Pretrained Weights**: Unsloth’s pretrained version for Llama-3.2-3B
 								- **Quantization**: 4-bit quantization (set via `load_in_4bit=True`) for reduced memory usage
 								### LoRA Configuration:
 								- **Rank**: 16
 								- **Target Modules**:
 								  - q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
 								- **LoRA Alpha**: 16
 								- **LoRA Dropout**: 0
 								### Gradient Checkpointing:
 								- **Use Gradient Checkpointing**: "unsloth" for improved long-context training
 								## Training
 								- **Dataset**: FineTome-100k (first 500 records selected)
 								- **Loss Function**: Standard loss for sequence-to-sequence tasks
 								- **Training Steps**: 60 steps with batch size of 2 (gradient accumulation steps set to 4)
 								- **Optimizer**: AdamW 8-bit
 								### Training Parameters:
 								- **Max Sequence Length**: 2048 tokens
 								- **Learning Rate**: 2e-4
 								- **Gradient Accumulation Steps**: 4
 								- **Total Steps**: 60
 								- **Epochs**: 1 (as `max_steps` was set to 60)
 								- **Training Precision**: Use FP16 or BF16 for training depending on GPU support
 								## Performance
 								- **GPU Used**: Tesla T4 (14.7 GB max memory)
 								### Peak Memory Usage:
 								- **Total Reserved Memory**: 3.855 GB
 								- **Memory Used for LoRA**: 1.312 GB
 								- **Memory Utilization**: 26.1% (peak) of available memory
 								## Conclusion
 								This notebook showcases an efficient approach to fine-tuning large language models with memory optimizations and improved training efficiency using **LoRA** and **4-bit quantization**. The **Unsloth** library allows for fast training and inference, making this setup ideal for large-scale tasks even with limited GPU resources.
 								## Notebook
 								Access the implementation notebook for this model [here](https://github.com/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/Llama_3_2_3B_SFT_GGUF.ipynb). This notebook provides detailed steps for fine-tuning and deploying the model.