--- license: mit datasets: - mlabonne/FineTome-100k language: - en base_model: - unsloth/Llama-3.2-3B-Instruct pipeline_tag: question-answering --- # Llama-3.2-3B-Instruct Fine-Tuning on Custom Dataset ## Overview This repository demonstrates the process of fine-tuning the **Llama-3.2-3B-Instruct** model using the **Unsloth** library. The model is trained on a custom dataset, **FineTome-100k**, for **60 steps**. Key optimizations include: - **4-bit quantization** to reduce memory usage - **LoRA (Low-Rank Adaptation)** for efficient fine-tuning - Techniques for improving inference speed and generating text with the model ## Model Details - **Model Name**: Llama-3.2-3B-Instruct - **Pretrained Weights**: Unsloth’s pretrained version for Llama-3.2-3B - **Quantization**: 4-bit quantization (set via `load_in_4bit=True`) for reduced memory usage ### LoRA Configuration: - **Rank**: 16 - **Target Modules**: - q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj - **LoRA Alpha**: 16 - **LoRA Dropout**: 0 ### Gradient Checkpointing: - **Use Gradient Checkpointing**: "unsloth" for improved long-context training ## Training - **Dataset**: FineTome-100k (first 500 records selected) - **Loss Function**: Standard loss for sequence-to-sequence tasks - **Training Steps**: 60 steps with batch size of 2 (gradient accumulation steps set to 4) - **Optimizer**: AdamW 8-bit ### Training Parameters: - **Max Sequence Length**: 2048 tokens - **Learning Rate**: 2e-4 - **Gradient Accumulation Steps**: 4 - **Total Steps**: 60 - **Epochs**: 1 (as `max_steps` was set to 60) - **Training Precision**: Use FP16 or BF16 for training depending on GPU support ## Performance - **GPU Used**: Tesla T4 (14.7 GB max memory) ### Peak Memory Usage: - **Total Reserved Memory**: 3.855 GB - **Memory Used for LoRA**: 1.312 GB - **Memory Utilization**: 26.1% (peak) of available memory ## Conclusion This notebook showcases an efficient approach to fine-tuning large language models with memory optimizations and improved training efficiency using **LoRA** and **4-bit quantization**. The **Unsloth** library allows for fast training and inference, making this setup ideal for large-scale tasks even with limited GPU resources. ## Notebook Access the implementation notebook for this model [here](https://github.com/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/Llama_3_2_3B_SFT_GGUF.ipynb). This notebook provides detailed steps for fine-tuning and deploying the model.