---
library_name: transformers
license: llama3.2
base_model: meta-llama/Llama-3.2-1B-Instruct
tags:
- peft-factory
- full
- llama-factory
- generated_from_trainer
model-index:
- name: train_sst2_42_1779207274
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# train_sst2_42_1779207274

This model is a fine-tuned version of [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) on the sst2 dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0970
- Num Input Tokens Seen: 18647328

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-06
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 5

### Training results

| Training Loss | Epoch  | Step  | Validation Loss | Input Tokens Seen |
|:-------------:|:------:|:-----:|:---------------:|:-----------------:|
| 0.4074        | 0.2501 | 1895  | 0.1552          | 930944            |
| 0.3196        | 0.5002 | 3790  | 0.1577          | 1864128           |
| 0.0028        | 0.7503 | 5685  | 0.0970          | 2790656           |
| 0.0006        | 1.0004 | 7580  | 0.1143          | 3726464           |
| 0.1179        | 1.2505 | 9475  | 0.1166          | 4658240           |
| 0.1073        | 1.5006 | 11370 | 0.1257          | 5591680           |
| 0.342         | 1.7507 | 13265 | 0.1152          | 6528448           |
| 0.0004        | 2.0008 | 15160 | 0.1182          | 7463024           |
| 0.0556        | 2.2509 | 17055 | 0.1500          | 8395632           |
| 0.0962        | 2.5010 | 18950 | 0.1142          | 9326256           |
| 0.0429        | 2.7511 | 20845 | 0.1603          | 10259504          |
| 0.0352        | 3.0012 | 22740 | 0.1483          | 11196096          |
| 0.0352        | 3.2513 | 24635 | 0.1809          | 12128448          |
| 0.0           | 3.5014 | 26530 | 0.1809          | 13069824          |
| 0.0243        | 3.7515 | 28425 | 0.2036          | 13996672          |
| 0.0002        | 4.0016 | 30320 | 0.1816          | 14924944          |
| 0.0087        | 4.2517 | 32215 | 0.2473          | 15859920          |
| 0.0           | 4.5018 | 34110 | 0.2764          | 16790288          |
| 0.0           | 4.7519 | 36005 | 0.2836          | 17721744          |


### Framework versions

- Transformers 4.51.3
- Pytorch 2.10.0+cu128
- Datasets 4.0.0
- Tokenizers 0.21.4