54 lines
2.4 KiB
Markdown
54 lines
2.4 KiB
Markdown
---
|
|
language:
|
|
- en
|
|
license: apache-2.0
|
|
tags:
|
|
- text-generation-inference
|
|
- transformers
|
|
- unsloth
|
|
- llama
|
|
- trl
|
|
- sft
|
|
- axium
|
|
base_model: unsloth/meta-llama-3.1-8b-bnb-4bit
|
|
---
|
|
|
|
# About the uploaded model
|
|
|
|
- **Developed by:** prithivMLmods
|
|
- **License:** apache-2.0
|
|
- **Finetuned from model :** unsloth/meta-llama-3.1-8b-bnb-4bit
|
|
|
|
**The model is still in the training phase. This is not the final version and may contain artifacts and perform poorly in some cases.**
|
|
|
|
## Trainer Configuration
|
|
|
|
| **Parameter** | **Value** |
|
|
|------------------------------|------------------------------------------|
|
|
| **Model** | `model` |
|
|
| **Tokenizer** | `tokenizer` |
|
|
| **Train Dataset** | `dataset` |
|
|
| **Dataset Text Field** | `text` |
|
|
| **Max Sequence Length** | `max_seq_length` |
|
|
| **Dataset Number of Processes** | `2` |
|
|
| **Packing** | `False` (Can make training 5x faster for short sequences.) |
|
|
| **Training Arguments** | |
|
|
| - **Per Device Train Batch Size** | `2` |
|
|
| - **Gradient Accumulation Steps** | `4` |
|
|
| - **Warmup Steps** | `5` |
|
|
| - **Number of Train Epochs** | `1` (Set this for 1 full training run.) |
|
|
| - **Max Steps** | `60` |
|
|
| - **Learning Rate** | `2e-4` |
|
|
| - **FP16** | `not is_bfloat16_supported()` |
|
|
| - **BF16** | `is_bfloat16_supported()` |
|
|
| - **Logging Steps** | `1` |
|
|
| - **Optimizer** | `adamw_8bit` |
|
|
| - **Weight Decay** | `0.01` |
|
|
| - **LR Scheduler Type** | `linear` |
|
|
| - **Seed** | `3407` |
|
|
| - **Output Directory** | `outputs` |
|
|
|
|
.
|
|
|
|
.
|
|
This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |