Files
Llama_3_2_3B_SFT_GGUF/README.md
ModelHub XC 96910917b9 初始化项目,由ModelHub XC社区提供模型
Model: SURESHBEEKHANI/Llama_3_2_3B_SFT_GGUF
Source: Original Platform
2026-04-20 13:53:07 +08:00

67 lines
2.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
license: mit
datasets:
- mlabonne/FineTome-100k
language:
- en
base_model:
- unsloth/Llama-3.2-3B-Instruct
pipeline_tag: question-answering
---
# Llama-3.2-3B-Instruct Fine-Tuning on Custom Dataset
## Overview
This repository demonstrates the process of fine-tuning the **Llama-3.2-3B-Instruct** model using the **Unsloth** library. The model is trained on a custom dataset, **FineTome-100k**, for **60 steps**. Key optimizations include:
- **4-bit quantization** to reduce memory usage
- **LoRA (Low-Rank Adaptation)** for efficient fine-tuning
- Techniques for improving inference speed and generating text with the model
## Model Details
- **Model Name**: Llama-3.2-3B-Instruct
- **Pretrained Weights**: Unsloths pretrained version for Llama-3.2-3B
- **Quantization**: 4-bit quantization (set via `load_in_4bit=True`) for reduced memory usage
### LoRA Configuration:
- **Rank**: 16
- **Target Modules**:
- q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **LoRA Alpha**: 16
- **LoRA Dropout**: 0
### Gradient Checkpointing:
- **Use Gradient Checkpointing**: "unsloth" for improved long-context training
## Training
- **Dataset**: FineTome-100k (first 500 records selected)
- **Loss Function**: Standard loss for sequence-to-sequence tasks
- **Training Steps**: 60 steps with batch size of 2 (gradient accumulation steps set to 4)
- **Optimizer**: AdamW 8-bit
### Training Parameters:
- **Max Sequence Length**: 2048 tokens
- **Learning Rate**: 2e-4
- **Gradient Accumulation Steps**: 4
- **Total Steps**: 60
- **Epochs**: 1 (as `max_steps` was set to 60)
- **Training Precision**: Use FP16 or BF16 for training depending on GPU support
## Performance
- **GPU Used**: Tesla T4 (14.7 GB max memory)
### Peak Memory Usage:
- **Total Reserved Memory**: 3.855 GB
- **Memory Used for LoRA**: 1.312 GB
- **Memory Utilization**: 26.1% (peak) of available memory
## Conclusion
This notebook showcases an efficient approach to fine-tuning large language models with memory optimizations and improved training efficiency using **LoRA** and **4-bit quantization**. The **Unsloth** library allows for fast training and inference, making this setup ideal for large-scale tasks even with limited GPU resources.
## Notebook
Access the implementation notebook for this model [here](https://github.com/SURESHBEEKHANI/Advanced-LLM-Fine-Tuning/blob/main/Llama_3_2_3B_SFT_GGUF.ipynb). This notebook provides detailed steps for fine-tuning and deploying the model.