Fine-tuning the OPT-125M model on a reduced corpus of mc4-Portuguese with approximately 300M tokens.
Hyper-parameters
learning_rate = 5e-5
batch_size = 32
warmup = 500
seq_length = 512
num_train_epochs = 2.0
With an A100 with 40GB of RAM, the training took around 3 hours
Perplexity: 9.4
Sample Use
fromtransformersimportpipelinegenerator=pipeline('text-generation',model='Mirelle/opt-125M-pt-br-finetuned',max_length=100,do_sample=True)generator("Em uma bela manhã de")